probnmn.models.program_prior

class probnmn.models.program_prior.ProgramPrior(vocabulary: allennlp.data.vocabulary.Vocabulary, input_size: int = 256, hidden_size: int = 128, num_layers: int = 2, dropout: float = 0.0)[source]

Bases: torch.nn.modules.module.Module

A simple language model which learns a prior over all the valid program sequences in CLEVR v1.0 training split.

Parameters
vocabulary: allennlp.data.vocabulary.Vocabulary

AllenNLP’s vocabulary. This vocabulary has three namespaces - “questions”, “programs” and “answers”, which contain respective token to integer mappings.

input_size: int, optional (default = 256)

The dimension of the inputs to the LSTM.

hidden_size: int, optional (default = 256)

The dimension of the outputs of the LSTM.

num_layers: int, optional (default = 2)

Number of recurrent layers in the LSTM.

dropout: float, optional (default = 0.0)

Dropout probability for the outputs of LSTM at each layer except last.

classmethod from_config(config:probnmn.config.Config)[source]

Instantiate this class directly from a Config.

forward(self, program_tokens:torch.Tensor)[source]

Given tokenized program sequences padded upto maximum length, predict sequence at next time-step and calculate cross entropy loss of this predicted sequence.

Parameters
program_tokens: torch.Tensor

Tokenized program sequences padded with zeroes upto maximum length. shape: (batch_size, max_sequence_length)

Returns
Dict[str, torch.Tensor]

Predictions of next time-step and cross entropy loss (by teacher forcing), a dict with structure:

{
    "predictions": torch.Tensor (shape: (batch_size, max_sequence_length - 1)),
    "loss": torch.Tensor (shape: (batch_size, ))
}
get_metrics(self, reset:bool=True) → Dict[str, float][source]

Return perplexity using the accumulated loss.

Parameters
reset: bool, optional (default = True)

Whether to reset the accumulated metrics after retrieving them.

Returns
Dict[str, float]

A dictionary with metrics {"perplexity"}.

sample(self, num_samples:int=1, max_sequence_length:int=28) → Dict[str, torch.Tensor][source]

Using @start@ token at first time-step, perform categorical sampling and sample program sequences freely, all sequences would be padded after encountering first @end@ token.

This method is mainly useful in checking coherence and sensitivity of our model’s beliefs.

Parameters
num_samples: int, optional (default = 1)

Number of program_samples to generate.

max_sequence_length: int, optional (default = 28)

Maximum decoding steps while sampling programs. This includes @start@ token. Output sequences will be one time-step smaller, excluding @start@.

Returns
Dict[str, torch.Tensor]

A dict with predictions and sequence log-probabilities (averaged across time-steps). This would acutally return negative log-probabilities and name it “loss” for API consistency. The dict structure looks like:

{
    "predictions": torch.Tensor (shape: (batch_size, max_sequence_length - 1)),
    "loss": torch.Tensor (shape: (batch_size, ))
}