virtex.data.tokenizers
- class virtex.data.tokenizers.SentencePieceBPETokenizer(model_path: str)[source]
Bases:
object
A tokenizer based on SentencePiece with BPE sub-routine. It encodes caption strings into list of tokens.
- Parameters
model_path – Path to the
.model
file trained by SentencePiece.