Run Generated Quantities¶
The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:
- generate simulated data for model testing by forward sampling
- generate predictions for new data
- calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
- calculating posterior expectations
- transform parameters for reporting
- apply full Bayesian decision theory
- calculate log likelihoods, deviances, etc. for model comparison
The CmdStanModel class generate_quantities
method is useful once you
have successfully fit a model to your data and have a valid
sample from the posterior and a version of the original model
where the generated quantities block contains the necessary statements
to compute additional quantities of interest.
By running the generate_quantities
method on the new model with
a sample generated by the existing model, the sampler uses the
per-draw parameter estimates from the sample to compute the
generated quantities block of the new model.
The generate_quantities
method returns a CmdStanGQ
object
which provides properties to retrieve information about the sample:
chains
column_names
generated_quantities
generated_quantities_pd
sample_plus_quantities
save_csvfiles()
The sample_plus_quantities
combines the existing sample
and new quantities of interest into a pandas DataFrame object
which can be used for downstream analysis and visualization.
In this way you add more columns of information to an existing sample.
Configuration¶
mcmc_sample
- either aCmdStanMCMC
object or a list of stan-csv filesdata
: Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.seed
: The seed for random number generator.gq_output_dir
: A path or file name which will be used as the basename for the CmdStan output files.
Example: add posterior predictive checks to bernoulli.stan
¶
In this example we use the CmdStan example model
bernoulli.stan
and data file
bernoulli.data.json
as our existing model and data.
We create the program
bernoulli_ppc.stan
by adding a generated quantities
block to bernoulli.stan
which generates a new data vector y_rep
using the current estimate of theta.
generated quantities {
int y_sim[N];
real<lower=0,upper=1> theta_rep;
for (n in 1:N)
y_sim[n] = bernoulli_rng(theta);
theta_rep = sum(y) / N;
}
The first step is to fit model bernoulli
to the data:
import os
from cmdstanpy import CmdStanModel, cmdstan_path
bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
bernoulli_path = os.path.join(bernoulli_dir, 'bernoulli.stan')
bernoulli_data = os.path.join(bernoulli_dir, 'bernoulli.data.json')
# instantiate, compile bernoulli model
bernoulli_model = CmdStanModel(stan_file=bernoulli_path)
# fit the model to the data
bern_fit = bernoulli_model.sample(data=bernoulli_data)
Then we compile the model bernoulli_ppc
and use the fit parameter estimates
to generate quantities of interest:
bernoulli_ppc_model = CmdStanModel(stan_file='bernoulli_ppc.stan')
new_quantities = bernoulli_ppc_model.generate_quantities(data=bern_data, mcmc_sample=bern_fit)
The generate_quantities
method returns a CmdStanGQ
object which
contains the values for all variables in the generated quantitites block
of the program bernoulli_ppc.stan
. Unlike the output from the
sample
method, it doesn’t contain any information on the joint log
probability density, sampler state, or parameters or transformed
parameter values.
new_quantities.column_names
new_quantities.generated_quantities.shape
for i in range(len(new_quantities.column_names)):
print(new_quantities.generated_quantities[:,i].mean())
The method sample_plus_quantities
returns a pandas DataFrame which
combines the input drawset with the generated quantities.
sample_plus = new_quantities.sample_plus_quantities
print(sample_plus.shape)
print(sample_plus.columns)