probnmn.data.datasets¶
-
class
probnmn.data.datasets.ProgramPriorDataset(tokens_h5path: str)[source]¶ Bases:
torch.utils.data.dataset.DatasetProvides programs as tokenized sequences to train the class:~probnmn.models.program_prior.ProgramPrior.
- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
-
class
probnmn.data.datasets.QuestionCodingDataset(tokens_h5path: str, num_supervision: int = 699989, supervision_question_max_length: int = 40)[source]¶ Bases:
torch.utils.data.dataset.DatasetProvides questions and programs as tokenized sequences for Question Coding. It also provides a “supervision” flag, which can behave as a mask when batched, to tune the amount of program supervision on
ProgramGenerator.- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
- num_supervision: int, optional (default = None)
Number of examples where there would be a program supervision over questions, for
ProgramGenerator.- supervision_question_max_length: int, optional (default = 30)
Maximum length of question for picking examples with program supervision.
Notes
For a fixed numpy random seed, the randomly generated supervision list will always be same.
-
get_supervision_list(self)[source]¶ Return a list of 1’s and 0’s, indicating which examples have program supervision during question coding. Used by
SupervisionWeightedRandomSamplerto form a mini-batch with nearly equal number of examples with(out) program supervision.
-
class
probnmn.data.datasets.ModuleTrainingDataset(tokens_h5path: str, features_h5path: str, in_memory: bool = True)[source]¶ Bases:
torch.utils.data.dataset.DatasetProvides questions, image features an answers for module training. Programs are inferred by
ProgramGeneratortrained during Question Coding.- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
- features_h5path: str
Path to an HDF file containing a ‘dataset’ of pre-extracted image features.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.
-
class
probnmn.data.datasets.JointTrainingDataset(tokens_h5path: str, features_h5path: str, num_supervision: int = 699989, supervision_question_max_length: int = 30, in_memory: bool = True)[source]¶ Bases:
torch.utils.data.dataset.DatasetProvides questions, programs, supervision flag, image features and answers for Joint Training. If the random seed is set carefully, then the supervision list is made same as that in
QuestionCodingDataset.- Parameters
- tokens_h5path: str
Path to an HDF file to initialize the underlying reader.
- features_h5path: str
Path to an HDF file containing a ‘dataset’ of pre-extracted image features.
- num_supervision: int, optional (default = None)
Number of examples where there would be a program supervision over questions, for
ProgramGenerator.- supervision_question_max_length: int, optional (default = 30)
Maximum length of question for picking examples with program supervision.
- in_memory: bool, optional (default = True)
Whether to load all image features in memory.
Notes
For a fixed numpy random seed, the randomly generated supervision list will always be same.
-
get_supervision_list(self)[source]¶ Return a list of 1’s and 0’s, indicating which examples have program supervision during question coding. Used by
SupervisionWeightedRandomSamplerto form a mini-batch with nearly equal number of examples with(out) program supervision.