probnmn.modules.elbo

class probnmn.modules.elbo.Reinforce(baseline_decay: float = 0.99)[source]

Bases: torch.nn.modules.module.Module

A PyTorch module which applies REINFORCE to inputs using a specified reward, and internally keeps track of a decaying moving average baseline.

Parameters
baseline_decay: float, optional (default = 0.99)

Factor by which the moving average baseline decays on every call.

class probnmn.modules.elbo._ElboWithReinforce(beta: float = 0.1, baseline_decay: float = 0.99)[source]

Bases: torch.nn.modules.module.Module

A PyTorch Module to compute the Fully Monte Carlo form of Evidence Lower Bound, given the inference likelihood, reconstruction likelihood and a REINFORCE reward. Accepting any scalar as REINFORCE reward allows flexibility in ELBO objective - like we have an extra answer log-likelihood term during Joint Training.

This class is not used directly, instead its extended classes QuestionCodingElbo and JointTrainingElbo are used in corresponding phases.

Parameters
beta: float, optional (default = 0.1)

KL co-efficient. Refer BETA in Config.

baseline_decay: float, optional (default = 0.99)

Decay co-efficient for moving average REINFORCE baseline. Refer DELTA in Config.

class probnmn.modules.elbo.QuestionCodingElbo(program_generator: probnmn.models.program_generator.ProgramGenerator, question_reconstructor: probnmn.models.question_reconstructor.QuestionReconstructor, program_prior: probnmn.models.program_prior.ProgramPrior, beta: float = 0.1, baseline_decay: float = 0.99)[source]

Bases: probnmn.modules.elbo._ElboWithReinforce

A PyTorch module to compute Evidence Lower Bound for observed questions without (GT) program supervision. This implementation takes the Fully Monte Carlo form, and uses Reinforce estimator for gradient estimation of parameters of the inference model (ProgramGenerator).

Parameters
program_generator: ProgramGenerator

A ProgramGenerator, serves as inference model of the posterior (programs).

question_reconstructor: QuestionReconstructor

A QuestionReconstructor, serves as reconstruction model of observed data (questions).

program_prior: ProgramPrior

A ProgramPrior, serves as prior of the posterior distribution (programs).

beta: float, optional (default = 0.1)

KL co-efficient. Refer BETA in Config.

baseline_decay: float, optional (default = 0.99)

Decay co-efficient for moving average REINFORCE baseline. Refer DELTA in Config.

class probnmn.modules.elbo.JointTrainingElbo(program_generator: probnmn.models.program_generator.ProgramGenerator, question_reconstructor: probnmn.models.question_reconstructor.QuestionReconstructor, program_prior: probnmn.models.program_prior.ProgramPrior, nmn: probnmn.models.nmn.NeuralModuleNetwork, beta: float = 0.1, gamma: float = 10, baseline_decay: float = 0.99, objective: str = 'ours')[source]

Bases: probnmn.modules.elbo._ElboWithReinforce

A PyTorch module to compute Evidence Lower Bound for observed questions without (GT) program supervision with the added answer log-likelihood term in the bound, from Joint Training objective. This implementation takes the Fully Monte Carlo form, and uses Reinforce estimator for gradient estimation of parameters of the inference model (ProgramGenerator).

Parameters
program_generator: ProgramGenerator

A ProgramGenerator, serves as inference model of the posterior (programs).

question_reconstructor: QuestionReconstructor

A QuestionReconstructor, serves as reconstruction model of observed data (questions).

program_prior: ProgramPrior

A ProgramPrior, serves as prior of the posterior distribution (programs).

nmn: NeuralModuleNetwork

A NeuralModuleNetwork, for answer log-likelihood term in the objective.

beta: float, optional (default = 0.1)

KL co-efficient. Refer BETA in Config.

gamma: float, optional (default = 10)

Answer log-likelihood scaling co-efficient. Refer GAMMA in Config.

baseline_decay: float, optional (default = 0.99)

Decay co-efficient for moving average REINFORCE baseline. Refer DELTA in Config.

objective: str, optional (default = “ours”)

Training objective, “baseline” - REINFORCE reward would only have answer log-likelihood. “ours” - REINFORCE reward would have the full Evidence Lower Bound added.