Run Generated Quantities¶
The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:
- generate simulated data for model testing by forward sampling
- generate predictions for new data
- calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
- calculating posterior expectations
- transform parameters for reporting
- apply full Bayesian decision theory
- calculate log likelihoods, deviances, etc. for model comparison
The CmdStanModel class generate_quantities method is useful once you
have successfully fit a model to your data and have a valid
sample from the posterior and a version of the original model
where the generated quantities block contains the necessary statements
to compute additional quantities of interest.
By running the generate_quantities method on the new model with
a sample generated by the existing model, the sampler uses the
per-draw parameter estimates from the sample to compute the
generated quantities block of the new model.
The generate_quantities method returns a CmdStanGQ object
which provides properties to retrieve information about the sample:
chainscolumn_namesgenerated_quantitiesgenerated_quantities_pdsample_plus_quantitiessave_csvfiles()
The sample_plus_quantities combines the existing sample
and new quantities of interest into a pandas DataFrame object
which can be used for downstream analysis and visualization.
In this way you add more columns of information to an existing sample.
Configuration¶
mcmc_sample- either aCmdStanMCMCobject or a list of stan-csv filesdata: Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.seed: The seed for random number generator.gq_output_dir: A path or file name which will be used as the basename for the CmdStan output files.
Example: add posterior predictive checks to bernoulli.stan¶
In this example we use the CmdStan example model
bernoulli.stan
and data file
bernoulli.data.json
as our existing model and data.
We create the program
bernoulli_ppc.stan
by adding a generated quantities block to bernoulli.stan
which generates a new data vector y_rep using the current estimate of theta.
generated quantities {
int y_sim[N];
real<lower=0,upper=1> theta_rep;
for (n in 1:N)
y_sim[n] = bernoulli_rng(theta);
theta_rep = sum(y) / N;
}
The first step is to fit model bernoulli to the data:
import os
from cmdstanpy import CmdStanModel, cmdstan_path
bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
bernoulli_path = os.path.join(bernoulli_dir, 'bernoulli.stan')
bernoulli_data = os.path.join(bernoulli_dir, 'bernoulli.data.json')
# instantiate, compile bernoulli model
bernoulli_model = CmdStanModel(stan_file=bernoulli_path)
# fit the model to the data
bern_fit = bernoulli_model.sample(data=bernoulli_data)
Then we compile the model bernoulli_ppc and use the fit parameter estimates
to generate quantities of interest:
bernoulli_ppc_model = CmdStanModel(stan_file='bernoulli_ppc.stan')
new_quantities = bernoulli_ppc_model.generate_quantities(data=bern_data, mcmc_sample=bern_fit)
The generate_quantities method returns a CmdStanGQ object which
contains the values for all variables in the generated quantitites block
of the program bernoulli_ppc.stan. Unlike the output from the
sample method, it doesn’t contain any information on the joint log
probability density, sampler state, or parameters or transformed
parameter values.
new_quantities.column_names
new_quantities.generated_quantities.shape
for i in range(len(new_quantities.column_names)):
print(new_quantities.generated_quantities[:,i].mean())
The method sample_plus_quantities returns a pandas DataFrame which
combines the input drawset with the generated quantities.
sample_plus = new_quantities.sample_plus_quantities
print(sample_plus.shape)
print(sample_plus.columns)