probnmn.models.program_prior¶
-
class
probnmn.models.program_prior.
ProgramPrior
(vocabulary: allennlp.data.vocabulary.Vocabulary, input_size: int = 256, hidden_size: int = 128, num_layers: int = 2, dropout: float = 0.0)[source]¶ Bases:
torch.nn.modules.module.Module
A simple language model which learns a prior over all the valid program sequences in CLEVR v1.0 training split.
- Parameters
- vocabulary: allennlp.data.vocabulary.Vocabulary
AllenNLP’s vocabulary. This vocabulary has three namespaces - “questions”, “programs” and “answers”, which contain respective token to integer mappings.
- input_size: int, optional (default = 256)
The dimension of the inputs to the LSTM.
- hidden_size: int, optional (default = 256)
The dimension of the outputs of the LSTM.
- num_layers: int, optional (default = 2)
Number of recurrent layers in the LSTM.
- dropout: float, optional (default = 0.0)
Dropout probability for the outputs of LSTM at each layer except last.
-
classmethod
from_config
(config:probnmn.config.Config)[source]¶ Instantiate this class directly from a
Config
.
-
forward
(self, program_tokens:torch.Tensor)[source]¶ Given tokenized program sequences padded upto maximum length, predict sequence at next time-step and calculate cross entropy loss of this predicted sequence.
- Parameters
- program_tokens: torch.Tensor
Tokenized program sequences padded with zeroes upto maximum length. shape: (batch_size, max_sequence_length)
- Returns
- Dict[str, torch.Tensor]
Predictions of next time-step and cross entropy loss (by teacher forcing), a dict with structure:
{ "predictions": torch.Tensor (shape: (batch_size, max_sequence_length - 1)), "loss": torch.Tensor (shape: (batch_size, )) }
-
get_metrics
(self, reset:bool=True) → Dict[str, float][source]¶ Return perplexity using the accumulated loss.
- Parameters
- reset: bool, optional (default = True)
Whether to reset the accumulated metrics after retrieving them.
- Returns
- Dict[str, float]
A dictionary with metrics
{"perplexity"}
.
-
sample
(self, num_samples:int=1, max_sequence_length:int=28) → Dict[str, torch.Tensor][source]¶ Using @start@ token at first time-step, perform categorical sampling and sample program sequences freely, all sequences would be padded after encountering first @end@ token.
This method is mainly useful in checking coherence and sensitivity of our model’s beliefs.
- Parameters
- num_samples: int, optional (default = 1)
Number of program_samples to generate.
- max_sequence_length: int, optional (default = 28)
Maximum decoding steps while sampling programs. This includes @start@ token. Output sequences will be one time-step smaller, excluding @start@.
- Returns
- Dict[str, torch.Tensor]
A dict with predictions and sequence log-probabilities (averaged across time-steps). This would acutally return negative log-probabilities and name it “loss” for API consistency. The dict structure looks like:
{ "predictions": torch.Tensor (shape: (batch_size, max_sequence_length - 1)), "loss": torch.Tensor (shape: (batch_size, )) }