virtex.data.datasets
Pretraining Datasets
- class virtex.data.datasets.coco_captions.CocoCaptionsDataset(data_root: str, split: str)[source]
Bases:
torch.utils.data.dataset.Dataset
A PyTorch dataset to read COCO Captions dataset and provide it completely unprocessed. This dataset is used by various task-specific datasets in
datasets
module.- Parameters
data_root – Path to the COCO dataset root directory.
split – Name of COCO 2017 split to read. One of
{"train", "val"}
.
- class virtex.data.datasets.captioning.CaptioningDataset(data_root: str, split: str, tokenizer: virtex.data.tokenizers.SentencePieceBPETokenizer, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}), max_caption_length: int = 30)[source]
Bases:
torch.utils.data.dataset.Dataset
A dataset which provides image-caption (forward and backward) pairs from a COCO Captions annotation file. This is used for pretraining tasks which use captions - bicaptioning, forward captioning and token classification.
- Parameters
data_root – Path to dataset directory containing images and annotations.
split – Name of COCO 2017 split to read. One of
{"train", "val"}
.tokenizer – Tokenizer which maps word tokens to their integer IDs.
image_transform – List of image transformations, from either albumentations or
virtex.data.transforms
.max_caption_length – Maximum number of tokens to keep in caption tokens. Extra tokens will be trimmed from the right end of the token list.
- class virtex.data.datasets.classification.TokenClassificationDataset(data_root: str, split: str, tokenizer: virtex.data.tokenizers.SentencePieceBPETokenizer, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}), max_caption_length: int = 30)[source]
Bases:
torch.utils.data.dataset.Dataset
A dataset which provides image-labelset pairs from a COCO Captions annotation file. The set of caption tokens (unordered) is treated as a labelset.
- Parameters
data_root – Path to dataset directory containing images and annotations.
split – Name of COCO 2017 split to read. One of
{"train", "val"}
.tokenizer – Tokenizer which maps word tokens to their integer IDs.
image_transform –
List of image transformations, from either albumentations or
virtex.data.transforms
.max_caption_length – Maximum number of tokens to keep in caption tokens. Extra tokens will be trimmed from the right end of the token list.
- class virtex.data.datasets.classification.MultiLabelClassificationDataset(data_root: str, split: str, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]
Bases:
torch.utils.data.dataset.Dataset
A dataset which provides image-labelset pairs from COCO instance annotation files. This is used for multilabel classification pretraining task.
- Parameters
data_root – Path to dataset directory containing images and annotations.
split – Name of COCO 2017 split to read. One of
{"train", "val"}
.image_transform –
List of image transformations, from either albumentations or
virtex.data.transforms
.
Downstream Datasets
- class virtex.data.datasets.downstream.ImageNetDataset(data_root: str = 'datasets/imagenet', split: str = 'train', image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]
Bases:
torchvision.datasets.imagenet.ImageNet
Simple wrapper over torchvision’s ImageNet dataset. Image transform is handled here instead of passing to super class.
- Parameters
data_root – Path to the ImageNet dataset directory.
split – Which split to read from. One of
{"train", "val"}
.image_transform –
List of image transformations, from either albumentations or
virtex.data.transforms
.
- class virtex.data.datasets.downstream.INaturalist2018Dataset(data_root: str = 'datasets/inaturalist', split: str = 'train', image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]
Bases:
torch.utils.data.dataset.Dataset
A dataset which provides image-label pairs from the iNaturalist 2018 dataset.
- Parameters
data_root – Path to the iNaturalist 2018 dataset directory.
split – Which split to read from. One of
{"train", "val"}
.image_transform –
List of image transformations, from either albumentations or
virtex.data.transforms
.
- class virtex.data.datasets.downstream.VOC07ClassificationDataset(data_root: str = 'datasets/VOC2007', split: str = 'trainval', image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]
Bases:
torch.utils.data.dataset.Dataset
A dataset which provides image-label pairs from the PASCAL VOC 2007 dataset.
- Parameters
data_root – Path to VOC 2007 directory containing sub-directories named
Annotations
,ImageSets
, andJPEGImages
.split – Which split to read from. One of
{"trainval", "test"}
.image_transform –
List of image transformations, from either albumentations or
virtex.data.transforms
.
- class virtex.data.datasets.downstream.ImageDirectoryDataset(data_root: str, image_transform: Callable = Compose([SmallestMaxSize(always_apply=False, p=1.0, max_size=256, interpolation=1), CenterSquareCrop(always_apply=False, p=1.0, height=224, width=224), Normalize(always_apply=False, p=1.0, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0)], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}))[source]
Bases:
torch.utils.data.dataset.Dataset
A dataset which reads images from any directory. This class is useful to run image captioning inference on our models with any arbitrary images.
- Parameters
data_root – Path to a directory containing images.
image_transform –
List of image transformations, from either albumentations or
virtex.data.transforms
.