probnmn.models.program_prior¶

class probnmn.models.program_prior.ProgramPrior(vocabulary: allennlp.data.vocabulary.Vocabulary, input_size: int = 256, hidden_size: int = 128, num_layers: int = 2, dropout: float = 0.0)[source]¶

Bases: torch.nn.modules.module.Module

A simple language model which learns a prior over all the valid program sequences in CLEVR v1.0 training split.

Parameters

vocabulary: allennlp.data.vocabulary.Vocabulary: AllenNLP’s vocabulary. This vocabulary has three namespaces - “questions”, “programs” and “answers”, which contain respective token to integer mappings.
input_size: int, optional (default = 256): The dimension of the inputs to the LSTM.
hidden_size: int, optional (default = 256): The dimension of the outputs of the LSTM.
num_layers: int, optional (default = 2): Number of recurrent layers in the LSTM.
dropout: float, optional (default = 0.0): Dropout probability for the outputs of LSTM at each layer except last.

classmethod from_config(config:probnmn.config.Config)[source]¶: Instantiate this class directly from a Config.

forward(self, program_tokens:torch.Tensor)[source]¶

Given tokenized program sequences padded upto maximum length, predict sequence at next time-step and calculate cross entropy loss of this predicted sequence.

Parameters

program_tokens: torch.Tensor: Tokenized program sequences padded with zeroes upto maximum length. shape: (batch_size, max_sequence_length)

Returns

Dict[str, torch.Tensor]

Predictions of next time-step and cross entropy loss (by teacher forcing), a dict with structure:

{
    "predictions": torch.Tensor (shape: (batch_size, max_sequence_length - 1)),
    "loss": torch.Tensor (shape: (batch_size, ))
}

get_metrics(self, reset:bool=True) → Dict[str, float][source]¶

Return perplexity using the accumulated loss.

Parameters

reset: bool, optional (default = True): Whether to reset the accumulated metrics after retrieving them.

Returns

Dict[str, float]: A dictionary with metrics {"perplexity"}.

sample(self, num_samples:int=1, max_sequence_length:int=28) → Dict[str, torch.Tensor][source]¶

Using @start@ token at first time-step, perform categorical sampling and sample program sequences freely, all sequences would be padded after encountering first @end@ token.

This method is mainly useful in checking coherence and sensitivity of our model’s beliefs.

Parameters

num_samples: int, optional (default = 1): Number of program_samples to generate.
max_sequence_length: int, optional (default = 28): Maximum decoding steps while sampling programs. This includes @start@ token. Output sequences will be one time-step smaller, excluding @start@.

Returns

Dict[str, torch.Tensor]

A dict with predictions and sequence log-probabilities (averaged across time-steps). This would acutally return negative log-probabilities and name it “loss” for API consistency. The dict structure looks like:

{
    "predictions": torch.Tensor (shape: (batch_size, max_sequence_length - 1)),
    "loss": torch.Tensor (shape: (batch_size, ))
}

probnmn.models.program_prior¶

ProbNMN

Navigation

Related Topics