API Reference

Classes

CmdStanModel

A CmdStanModel object encapsulates the Stan program. It manages program compilation and provides the following inference methods:

sample()

runs the HMC-NUTS sampler to produce a set of draws from the posterior distribution.

optimize()

produce a penalized maximum likelihood estimate (point estimate) of the model parameters.

variational()

run CmdStan’s variational inference algorithm to approximate the posterior distribution.

generate_quantities()

runs CmdStan’s generate_quantities method to produce additional quantities of interest based on draws from an existing sample.

class cmdstanpy.CmdStanModel(model_name=None, stan_file=None, exe_file=None, compile=True, stanc_options=None, cpp_options=None, logger=None)[source]

The constructor method allows model instantiation given either the Stan program source file or the compiled executable, or both. By default, the constructor will compile the Stan program on instantiation unless the argument compile=False is specified. The set of constructor arguments are:

Parameters
  • model_name (Optional[str]) – Model name, used for output file names. Optional, default is the base filename of the Stan program file.

  • stan_file (Optional[str]) – Path to Stan program file.

  • exe_file (Optional[str]) – Path to compiled executable file. Optional, unless no Stan program file is specified. If both the program file and the compiled executable file are specified, the base filenames must match, (but different directory locations are allowed).

  • compile (bool) – Whether or not to compile the model. Default is True.

  • stanc_options (Optional[Dict[str, Any]]) – Options for stanc compiler, specified as a Python dictionary containing Stanc3 compiler option name, value pairs. Optional.

  • cpp_options (Optional[Dict[str, Any]]) – Options for C++ compiler, specified as a Python dictionary containing C++ compiler option name, value pairs. Optional.

  • logger (Optional[logging.Logger]) –

Return type

None

code()[source]

Return Stan program as a string.

Return type

Optional[str]

compile(force=False, stanc_options=None, cpp_options=None, override_options=False)[source]

Compile the given Stan program file. Translates the Stan code to C++, then calls the C++ compiler.

By default, this function compares the timestamps on the source and executable files; if the executable is newer than the source file, it will not recompile the file, unless argument force is True.

Parameters
  • force (bool) – When True, always compile, even if the executable file is newer than the source file. Used for Stan models which have #include directives in order to force recompilation when changes are made to the included files.

  • stanc_options (Optional[Dict[str, Any]]) – Options for stanc compiler.

  • cpp_options (Optional[Dict[str, Any]]) – Options for C++ compiler.

  • override_options (bool) – When True, override existing option. When False, add/replace existing options. Default is False.

Return type

None

generate_quantities(data=None, mcmc_sample=None, seed=None, gq_output_dir=None, sig_figs=None, refresh=None)[source]

Run CmdStan’s generate_quantities method which runs the generated quantities block of a model given an existing sample.

This function takes a CmdStanMCMC object and the dataset used to generate that sample and calls to the CmdStan generate_quantities method to generate additional quantities of interest.

The CmdStanGQ object records the command, the return code, and the paths to the generate method output csv and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.

Output files are either written to a temporary directory or to the specified output directory. Output filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.

Parameters
  • data (Optional[Union[Mapping[str, Any], str]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.

  • mcmc_sample (Optional[Union[cmdstanpy.stanfit.CmdStanMCMC, List[str]]]) – Can be either a CmdStanMCMC object returned by the sample() method or a list of stan-csv files generated by fitting the model to the data using any Stan interface.

  • seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified, numpy.random.RandomState is used to generate a seed which will be used for all chains. NOTE: Specifying the seed will guarantee the same result for multiple invocations of this method with the same inputs. However this will not reproduce results from the sample method given the same inputs because the RNG will be in a different state.

  • gq_output_dir (Optional[str]) – Name of the directory in which the CmdStan output files are saved. If unspecified, files will be written to a temporary directory which is deleted upon session exit.

  • sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.

  • refresh (Optional[int]) – Specify the number of iterations cmdstan will take between progress messages. Default value is 100.

Returns

CmdStanGQ object

Return type

cmdstanpy.stanfit.CmdStanGQ

optimize(data=None, seed=None, inits=None, output_dir=None, sig_figs=None, save_profile=False, algorithm=None, init_alpha=None, tol_obj=None, tol_rel_obj=None, tol_grad=None, tol_rel_grad=None, tol_param=None, history_size=None, iter=None, refresh=None)[source]

Run the specified CmdStan optimize algorithm to produce a penalized maximum likelihood estimate of the model parameters.

This function validates the specified configuration, composes a call to the CmdStan optimize method and spawns one subprocess to run the optimizer and waits for it to run to completion. Unspecified arguments are not included in the call to CmdStan, i.e., those arguments will have CmdStan default values.

The CmdStanMLE object records the command, the return code, and the paths to the optimize method output csv and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.

Output files are either written to a temporary directory or to the specified output directory. Ouput filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.

Parameters
  • data (Optional[Union[Mapping[str, Any], str]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.

  • seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified, numpy.random.RandomState is used to generate a seed.

  • inits (Optional[Union[Dict[str, float], float, str]]) –

    Specifies how the sampler initializes parameter values. Initialization is either uniform random on a range centered on 0, exactly 0, or a dictionary or file of initial values for some or all parameters in the model. The default initialization behavior will initialize all parameter values on range [-2, 2] on the unconstrained support. If the expected parameter values are too far from this range, this option may improve estimation. The following value types are allowed:

    • Single number, n > 0 - initialization range is [-n, n].

    • 0 - all parameters are initialized to 0.

    • dictionary - pairs parameter name : initial value.

    • string - pathname to a JSON or Rdump data file.

  • output_dir (Optional[str]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.

  • sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.

  • save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If True, csv outputs are written to a file ‘<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>’. Introduced in CmdStan-2.26.

  • algorithm (Optional[str]) – Algorithm to use. One of: ‘BFGS’, ‘LBFGS’, ‘Newton’

  • init_alpha (Optional[float]) – Line search step size for first iteration

  • tol_obj (Optional[float]) – Convergence tolerance on changes in objective function value

  • tol_rel_obj (Optional[float]) – Convergence tolerance on relative changes in objective function value

  • tol_grad (Optional[float]) – Convergence tolerance on the norm of the gradient

  • tol_rel_grad (Optional[float]) – Convergence tolerance on the relative norm of the gradient

  • tol_param (Optional[float]) – Convergence tolerance on changes in parameter value

  • history_size (Optional[int]) – Size of the history for LBFGS Hessian approximation. The value should be less than the dimensionality of the parameter space. 5-10 usually sufficient

  • iter (Optional[int]) – Total number of iterations

  • refresh (Optional[int]) – Specify the number of iterations cmdstan will take between progress messages. Default value is 100.

Returns

CmdStanMLE object

Return type

cmdstanpy.stanfit.CmdStanMLE

sample(data=None, chains=None, parallel_chains=None, threads_per_chain=None, seed=None, chain_ids=None, inits=None, iter_warmup=None, iter_sampling=None, save_warmup=False, thin=None, max_treedepth=None, metric=None, step_size=None, adapt_engaged=True, adapt_delta=None, adapt_init_phase=None, adapt_metric_window=None, adapt_step_size=None, fixed_param=False, output_dir=None, sig_figs=None, save_diagnostics=False, save_profile=False, show_progress=False, refresh=None)[source]

Run or more chains of the NUTS-HMC sampler to produce a set of draws from the posterior distribution of a model conditioned on some data.

This function validates the specified configuration, composes a call to the CmdStan sample method and spawns one subprocess per chain to run the sampler and waits for all chains to run to completion. Unspecified arguments are not included in the call to CmdStan, i.e., those arguments will have CmdStan default values.

For each chain, the CmdStanMCMC object records the command, the return code, the sampler output file paths, and the corresponding console outputs, if any. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.

Output files are either written to a temporary directory or to the specified output directory. Ouput filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.

Parameters
  • data (Optional[Union[Mapping[str, Any], str]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.

  • chains (Optional[int]) – Number of sampler chains, must be a positive integer.

  • parallel_chains (Optional[int]) – Number of processes to run in parallel. Must be a positive integer. Defaults to multiprocessing.cpu_count().

  • threads_per_chain (Optional[int]) – The number of threads to use in parallelized sections within an MCMC chain (e.g., when using the Stan functions reduce_sum() or map_rect()). This will only have an effect if the model was compiled with threading support. The total number of threads used will be parallel_chains * threads_per_chain.

  • seed (Optional[Union[int, List[int]]]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified, numpy.random.RandomState is used to generate a seed which will be used for all chains. When the same seed is used across all chains, the chain-id is used to advance the RNG to avoid dependent samples.

  • chain_ids (Optional[Union[int, List[int]]]) – The offset for the random number generator, either an integer or a list of unique per-chain offsets. If unspecified, chain ids are numbered sequentially starting from 1.

  • inits (Optional[Union[Dict[str, float], float, str, List[str]]]) –

    Specifies how the sampler initializes parameter values. Initialization is either uniform random on a range centered on 0, exactly 0, or a dictionary or file of initial values for some or all parameters in the model. The default initialization behavior will initialize all parameter values on range [-2, 2] on the unconstrained support. If the expected parameter values are too far from this range, this option may improve adaptation. The following value types are allowed:

    • Single number n > 0 - initialization range is [-n, n].

    • 0 - all parameters are initialized to 0.

    • dictionary - pairs parameter name : initial value.

    • string - pathname to a JSON or Rdump data file.

    • list of strings - per-chain pathname to data file.

  • iter_warmup (Optional[int]) – Number of warmup iterations for each chain.

  • iter_sampling (Optional[int]) – Number of draws from the posterior for each chain.

  • save_warmup (bool) – When True, sampler saves warmup draws as part of the Stan csv output file.

  • thin (Optional[int]) – Period between recorded iterations. Default is 1, i.e., all iterations are recorded.

  • max_treedepth (Optional[int]) – Maximum depth of trees evaluated by NUTS sampler per iteration.

  • metric (Optional[Union[str, List[str]]]) –

    Specification of the mass matrix, either as a vector consisting of the diagonal elements of the covariance matrix (‘diag’ or ‘diag_e’) or the full covariance matrix (‘dense’ or ‘dense_e’).

    If the value of the metric argument is a string other than ‘diag’, ‘diag_e’, ‘dense’, or ‘dense_e’, it must be a valid filepath to a JSON or Rdump file which contains an entry ‘inv_metric’ whose value is either the diagonal vector or the full covariance matrix.

    If the value of the metric argument is a list of paths, its length must match the number of chains and all paths must be unique.

  • step_size (Optional[Union[float, List[float]]]) – Initial step size for HMC sampler. The value is either a single number or a list of numbers which will be used as the global or per-chain initial step size, respectively. The length of the list of step sizes must match the number of chains.

  • adapt_engaged (bool) – When True, adapt step size and metric.

  • adapt_delta (Optional[float]) – Adaptation target Metropolis acceptance rate. The default value is 0.8. Increasing this value, which must be strictly less than 1, causes adaptation to use smaller step sizes which improves the effective sample size, but may increase the time per iteration.

  • adapt_init_phase (Optional[int]) – Iterations for initial phase of adaptation during which step size is adjusted so that the chain converges towards the typical set.

  • adapt_metric_window (Optional[int]) – The second phase of adaptation tunes the metric and step size in a series of intervals. This parameter specifies the number of iterations used for the first tuning interval; window size increases for each subsequent interval.

  • adapt_step_size (Optional[int]) – Number of iterations given over to adjusting the step size given the tuned metric during the final phase of adaptation.

  • fixed_param (bool) – When True, call CmdStan with argument algorithm=fixed_param which runs the sampler without updating the Markov Chain, thus the values of all parameters and transformed parameters are constant across all draws and only those values in the generated quantities block that are produced by RNG functions may change. This provides a way to use Stan programs to generate simulated data via the generated quantities block. This option must be used when the parameters block is empty. Default value is False.

  • output_dir (Optional[str]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.

  • sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.

  • save_diagnostics (bool) – Whether or not to output the position and momentum information for each parameter. If True, csv outputs are written to an output file using filename template ‘<model_name>-<YYYYMMDDHHMM>-diagnostic-<chain_id>’, e.g. ‘bernoulli-201912081451-diagnostic-1.csv’.

  • save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If True, csv outputs are written to a file ‘<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>’. Introduced in CmdStan-2.26.

  • show_progress (Union[bool, str]) – Use tqdm progress bar to show sampling progress. If show_progress==’notebook’ use tqdm_notebook (needs nodejs for jupyter).

  • refresh (Optional[int]) – Specify the number of iterations cmdstan will take between progress messages. Default value is 100.

Returns

CmdStanMCMC object

Return type

cmdstanpy.stanfit.CmdStanMCMC

variational(data=None, seed=None, inits=None, output_dir=None, sig_figs=None, save_diagnostics=False, save_profile=False, algorithm=None, iter=None, grad_samples=None, elbo_samples=None, eta=None, adapt_engaged=True, adapt_iter=None, tol_rel_obj=None, eval_elbo=None, output_samples=None, require_converged=True, refresh=None)[source]

Run CmdStan’s variational inference algorithm to approximate the posterior distribution of the model conditioned on the data.

This function validates the specified configuration, composes a call to the CmdStan variational method and spawns one subprocess to run the optimizer and waits for it to run to completion. Unspecified arguments are not included in the call to CmdStan, i.e., those arguments will have CmdStan default values.

The CmdStanVB object records the command, the return code, and the paths to the variational method output csv and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.

Output files are either written to a temporary directory or to the specified output directory. Output filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.

Parameters
  • data (Optional[Union[Mapping[str, Any], str]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.

  • seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified, numpy.random.RandomState is used to generate a seed which will be used for all chains.

  • inits (Optional[float]) – Specifies how the sampler initializes parameter values. Initialization is uniform random on a range centered on 0 with default range of 2. Specifying a single number n > 0 changes the initialization range to [-n, n].

  • output_dir (Optional[str]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.

  • sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.

  • save_diagnostics (bool) – Whether or not to save diagnostics. If True, csv outputs are written to an output file using filename template ‘<model_name>-<YYYYMMDDHHMM>-diagnostic-<chain_id>’, e.g. ‘bernoulli-201912081451-diagnostic-1.csv’.

  • save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If True, csv outputs are written to a file ‘<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>’. Introduced in CmdStan-2.26.

  • algorithm (Optional[str]) – Algorithm to use. One of: ‘meanfield’, ‘fullrank’.

  • iter (Optional[int]) – Maximum number of ADVI iterations.

  • grad_samples (Optional[int]) – Number of MC draws for computing the gradient.

  • elbo_samples (Optional[int]) – Number of MC draws for estimate of ELBO.

  • eta (Optional[float]) – Step size scaling parameter.

  • adapt_engaged (bool) – Whether eta adaptation is engaged.

  • adapt_iter (Optional[int]) – Number of iterations for eta adaptation.

  • tol_rel_obj (Optional[float]) – Relative tolerance parameter for convergence.

  • eval_elbo (Optional[int]) – Number of iterations between ELBO evaluations.

  • output_samples (Optional[int]) – Number of approximate posterior output draws to save.

  • require_converged (bool) – Whether or not to raise an error if stan reports that “The algorithm may not have converged”.

  • refresh (Optional[int]) – Specify the number of iterations cmdstan will take between progress messages. Default value is 100.

Returns

CmdStanVB object

Return type

cmdstanpy.stanfit.CmdStanVB

property cpp_options: Dict[str, Union[bool, int]]

Options to C++ compilers.

property exe_file: Optional[str]

Full path to Stan exe file.

property name: str

Model name used in output filename templates. Default is basename of Stan program or exe file, unless specified in call to constructor via argument model_name.

property stan_file: Optional[str]

Full path to Stan program file.

property stanc_options: Dict[str, Union[bool, int, str]]

Options to stanc compilers.

CmdStanMCMC

class cmdstanpy.CmdStanMCMC(runset, logger=None)[source]

Container for outputs from CmdStan sampler run. Provides methods to summarize and diagnose the model fit and accessor methods to access the entire sample or individual items. Created by CmdStanModel.sample()

The sample is lazily instantiated on first access of either the resulting sample or the HMC tuning parameters, i.e., the step size and metric.

Parameters
Return type

None

diagnose()[source]

Run cmdstan/bin/diagnose over all output csv files. Returns output of diagnose (stdout/stderr).

The diagnose utility reads the outputs of all chains and checks for the following potential problems:

  • Transitions that hit the maximum treedepth

  • Divergent transitions

  • Low E-BFMI values (sampler transitions HMC potential energy)

  • Low effective sample sizes

  • High R-hat values

Return type

Optional[str]

draws(*, inc_warmup=False, concat_chains=False)[source]

Returns a numpy.ndarray over all draws from all chains which is stored column major so that the values for a parameter are contiguous in memory, likewise all draws from a chain are contiguous. By default, returns a 3D array arranged (draws, chains, columns); parameter concat_chains=True will return a 2D array where all chains are flattened into a single column, preserving chain order, so that given M chains of N draws, the first N draws are from chain 1, up through the last N draws from chain M.

Parameters
  • inc_warmup (bool) – When True and the warmup draws are present in the output, i.e., the sampler was run with save_warmup=True, then the warmup draws are included. Default value is False.

  • concat_chains (bool) – When True return a 2D array flattening all all draws from all chains. Default value is False.

Return type

numpy.ndarray

draws_pd(vars=None, inc_warmup=False, *, params=None)[source]

Returns the sample draws as a pandas DataFrame. Flattens all chains into single column. Container variables (array, vector, matrix) will span multiple columns, one column per element. E.g. variable ‘matrix[2,2] foo’ spans 4 columns: ‘foo[1,1], … foo[2,2]’.

Parameters
  • vars (Optional[Union[str, List[str]]]) – optional list of variable names.

  • inc_warmup (bool) – When True and the warmup draws are present in the output, i.e., the sampler was run with save_warmup=True, then the warmup draws are included. Default value is False.

  • params (Optional[Union[str, List[str]]]) –

Return type

pandas.core.frame.DataFrame

draws_xr(vars=None, inc_warmup=False)[source]

Returns the sampler draws as a xarray Dataset.

Parameters
  • vars (Optional[Union[str, List[str]]]) – optional list of variable names.

  • inc_warmup (bool) – When True and the warmup draws are present in the output, i.e., the sampler was run with save_warmup=True, then the warmup draws are included. Default value is False.

Return type

xarray.core.dataset.Dataset

method_variables()[source]

Returns a dictionary of all sampler variables, i.e., all output column names ending in __. Assumes that all variables are scalar variables where column name is variable name. Maps each column name to a numpy.ndarray (draws x chains x 1) containing per-draw diagnostic values.

Return type

Dict[str, numpy.ndarray]

sampler_diagnostics()[source]

Deprecated, use “method_variables” instead

Return type

Dict[str, numpy.ndarray]

sampler_variables()[source]

Deprecated, use “method_variables” instead

Return type

Dict[str, numpy.ndarray]

save_csvfiles(dir=None)[source]

Move output csvfiles to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.

Parameters

dir (Optional[str]) – directory path

Return type

None

stan_variable(var=None, inc_warmup=False, *, name=None)[source]

Return a numpy.ndarray which contains the set of draws for the named Stan program variable. Flattens the chains, leaving the draws in chain order. The first array dimension, corresponds to number of draws or post-warmup draws in the sample, per argument inc_warmup. The remaining dimensions correspond to the shape of the Stan program variable.

Underlyingly draws are in chain order, i.e., for a sample with N chains of M draws each, the first M array elements are from chain 1, the next M are from chain 2, and the last M elements are from chain N.

  • If the variable is a scalar variable, the return array has shape ( draws X chains, 1).

  • If the variable is a vector, the return array has shape ( draws X chains, len(vector))

  • If the variable is a matrix, the return array has shape ( draws X chains, size(dim 1) X size(dim 2) )

  • If the variable is an array with N dimensions, the return array has shape ( draws X chains, size(dim 1) X … X size(dim N))

For example, if the Stan program variable theta is a 3x3 matrix, and the sample consists of 4 chains with 1000 post-warmup draws, this function will return a numpy.ndarray with shape (4000,3,3).

Parameters
  • var (Optional[str]) – variable name

  • inc_warmup (bool) – When True and the warmup draws are present in the output, i.e., the sampler was run with save_warmup=True, then the warmup draws are included. Default value is False.

  • name (Optional[str]) –

Return type

numpy.ndarray

stan_variables()[source]

Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.

Return type

Dict[str, numpy.ndarray]

summary(percentiles=None, sig_figs=None)[source]

Run cmdstan/bin/stansummary over all output csv files, assemble summary into DataFrame object; first row contains summary statistics for total joint log probability lp__, remaining rows contain summary statistics for all parameters, transformed parameters, and generated quantities variables listed in the order in which they were declared in the Stan program.

Parameters
  • percentiles (Optional[List[int]]) – Ordered non-empty list of percentiles to report. Must be integers from (1, 99), inclusive.

  • sig_figs (Optional[int]) – Number of significant figures to report. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. If precision above 6 is requested, sample must have been produced by CmdStan version 2.25 or later and sampler output precision must equal to or greater than the requested summary precision.

Returns

pandas.DataFrame

Return type

pandas.core.frame.DataFrame

property chain_ids: List[int]

Chain ids.

property chains: int

Number of chains.

property column_names: Tuple[str, ...]

Names of all outputs from the sampler, comprising sampler parameters and all components of all model parameters, transformed parameters, and quantities of interest. Corresponds to Stan CSV file header row, with names munged to array notation, e.g. beta[1] not beta.1.

property metadata: cmdstanpy.stanfit.InferenceMetadata

Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.

property metric: Optional[numpy.ndarray]

Metric used by sampler for each chain. When sampler algorithm ‘fixed_param’ is specified, metric is None.

property metric_type: Optional[str]

Metric type used for adaptation, either ‘diag_e’ or ‘dense_e’. When sampler algorithm ‘fixed_param’ is specified, metric_type is None.

property num_draws_sampling: int

Number of sampling (post-warmup) draws per chain, i.e., thinned sampling iterations.

property num_draws_warmup: int

Number of warmup draws per chain, i.e., thinned warmup iterations.

property num_unconstrained_params: int

Count of _unconstrained_ model parameters. This is the metric size; for metric diag_e, the length of the diagonal vector, for metric dense_e this is the size of the full covariance matrix.

If the parameter variables in a model are constrained parameter types, the number of constrained and unconstrained parameters may differ. The sampler reports the constrained parameters and computes with the unconstrained parameters. E.g. a model with 2 parameter variables, real alpha and vector[3] beta has 4 constrained and 4 unconstrained parameters, however a model with variables real alpha and simplex[3] beta has 4 constrained and 3 unconstrained parameters.

property sample: numpy.ndarray

Deprecated - use method “draws()” instead.

property sampler_vars_cols: Dict[str, Tuple[int, ...]]

Deprecated - use “metadata.method_vars_cols” instead

property stan_vars_cols: Dict[str, Tuple[int, ...]]

Deprecated - use “metadata.stan_vars_cols” instead

property stan_vars_dims: Dict[str, Tuple[int, ...]]

Deprecated - use “metadata.stan_vars_dims” instead

property step_size: Optional[numpy.ndarray]

Step size used by sampler for each chain. When sampler algorithm ‘fixed_param’ is specified, step size is None.

property thin: int

Period between recorded iterations. (Default is 1).

property warmup: numpy.ndarray

Deprecated - use “draws(inc_warmup=True)”

CmdStanMLE

class cmdstanpy.CmdStanMLE(runset)[source]

Container for outputs from CmdStan optimization. Created by CmdStanModel.optimize().

Parameters

runset (cmdstanpy.stanfit.RunSet) –

Return type

None

save_csvfiles(dir=None)[source]

Move output csvfiles to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.

Parameters

dir (Optional[str]) – directory path

Return type

None

stan_variable(var=None, *, name=None)[source]

Return a numpy.ndarray which contains the estimates for the for the named Stan program variable where the dimensions of the numpy.ndarray match the shape of the Stan program variable.

Parameters
  • var (Optional[str]) – variable name

  • name (Optional[str]) –

Return type

numpy.ndarray

stan_variables()[source]

Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.

Return type

Dict[str, numpy.ndarray]

property column_names: Tuple[str, ...]

Names of estimated quantities, includes joint log probability, and all parameters, transformed parameters, and generated quantities.

property metadata: cmdstanpy.stanfit.InferenceMetadata

Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.

property optimized_params_dict: Dict[str, float]

Returns optimized params as Dict.

property optimized_params_np: numpy.ndarray

Returns optimized params as numpy array.

property optimized_params_pd: pandas.core.frame.DataFrame

Returns optimized params as pandas DataFrame.

CmdStanGQ

class cmdstanpy.CmdStanGQ(runset, mcmc_sample)[source]

Container for outputs from CmdStan generate_quantities run. Created by CmdStanModel.generate_quantities().

Parameters
Return type

None

draws(*, inc_warmup=False, concat_chains=False, inc_sample=False)[source]

Returns a numpy.ndarray over the generated quantities draws from all chains which is stored column major so that the values for a parameter are contiguous in memory, likewise all draws from a chain are contiguous. By default, returns a 3D array arranged (draws, chains, columns); parameter concat_chains=True will return a 2D array where all chains are flattened into a single column, preserving chain order, so that given M chains of N draws, the first N draws are from chain 1, …, and the the last N draws are from chain M.

Parameters
  • inc_warmup (bool) – When True and the warmup draws are present in the output, i.e., the sampler was run with save_warmup=True, then the warmup draws are included. Default value is False.

  • concat_chains (bool) – When True return a 2D array flattening all all draws from all chains. Default value is False.

  • inc_sample (bool) – When True include all columns in the mcmc_sample draws array as well, excepting columns for variables already present in the generated quantities drawset. Default value is False.

Return type

numpy.ndarray

draws_pd(vars=None, inc_warmup=False, inc_sample=False)[source]

Returns the generated quantities draws as a pandas DataFrame. Flattens all chains into single column. Container variables (array, vector, matrix) will span multiple columns, one column per element. E.g. variable ‘matrix[2,2] foo’ spans 4 columns: ‘foo[1,1], … foo[2,2]’.

Parameters
  • vars (Optional[Union[str, List[str]]]) – optional list of variable names.

  • inc_warmup (bool) – When True and the warmup draws are present in the output, i.e., the sampler was run with save_warmup=True, then the warmup draws are included. Default value is False.

  • inc_sample (bool) –

Return type

pandas.core.frame.DataFrame

draws_xr(vars=None, inc_warmup=False, inc_sample=False)[source]

Returns the generated quantities draws as a xarray Dataset.

Parameters
  • vars (Optional[Union[str, List[str]]]) – optional list of variable names.

  • inc_warmup (bool) – When True and the warmup draws are present in the MCMC sample, then the warmup draws are included. Default value is False.

  • inc_sample (bool) –

Return type

xarray.core.dataset.Dataset

save_csvfiles(dir=None)[source]

Move output csvfiles to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.

Parameters

dir (Optional[str]) – directory path

Return type

None

stan_variable(var=None, inc_warmup=False, *, name=None)[source]

Return a numpy.ndarray which contains the set of draws for the named Stan program variable. Flattens the chains, leaving the draws in chain order. The first array dimension, corresponds to number of draws in the sample. The remaining dimensions correspond to the shape of the Stan program variable.

Underlyingly draws are in chain order, i.e., for a sample with N chains of M draws each, the first M array elements are from chain 1, the next M are from chain 2, and the last M elements are from chain N.

  • If the variable is a scalar variable, the return array has shape ( draws X chains, 1).

  • If the variable is a vector, the return array has shape ( draws X chains, len(vector))

  • If the variable is a matrix, the return array has shape ( draws X chains, size(dim 1) X size(dim 2) )

  • If the variable is an array with N dimensions, the return array has shape ( draws X chains, size(dim 1) X … X size(dim N))

For example, if the Stan program variable theta is a 3x3 matrix, and the sample consists of 4 chains with 1000 post-warmup draws, this function will return a numpy.ndarray with shape (4000,3,3).

Parameters
  • var (Optional[str]) – variable name

  • inc_warmup (bool) – When True and the warmup draws are present in the MCMC sample, then the warmup draws are included. Default value is False.

  • name (Optional[str]) –

Return type

numpy.ndarray

stan_variables(inc_warmup=False)[source]

Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.

Parameters

inc_warmup (bool) – When True and the warmup draws are present in the MCMC sample, then the warmup draws are included. Default value is False

Return type

Dict[str, numpy.ndarray]

property chain_ids: List[int]

Chain ids.

property chains: int

Number of chains.

property column_names: Tuple[str, ...]

Names of generated quantities of interest.

property generated_quantities: numpy.ndarray

Deprecated - use method draws instead.

property generated_quantities_pd: pandas.core.frame.DataFrame

Deprecated - use method draws_pd instead.

property metadata: cmdstanpy.stanfit.InferenceMetadata

Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.

property sample_plus_quantities: pandas.core.frame.DataFrame

Deprecated - use method “draws_pd(inc_sample=True)” instead.

CmdStanVB

class cmdstanpy.CmdStanVB(runset)[source]

Container for outputs from CmdStan variational run. Created by CmdStanModel.variational().

Parameters

runset (cmdstanpy.stanfit.RunSet) –

Return type

None

save_csvfiles(dir=None)[source]

Move output csvfiles to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.

Parameters

dir (Optional[str]) – directory path

Return type

None

stan_variable(var=None, *, name=None)[source]

Return a numpy.ndarray which contains the estimates for the for the named Stan program variable where the dimensions of the numpy.ndarray match the shape of the Stan program variable.

Parameters
  • var (Optional[str]) – variable name

  • name (Optional[str]) –

Return type

numpy.ndarray

stan_variables()[source]

Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.

Return type

Dict[str, numpy.ndarray]

property column_names: Tuple[str, ...]

Names of information items returned by sampler for each draw. Includes approximation information and names of model parameters and computed quantities.

property columns: int

Total number of information items returned by sampler. Includes approximation information and names of model parameters and computed quantities.

property metadata: cmdstanpy.stanfit.InferenceMetadata

Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.

property variational_params_dict: Dict[str, numpy.ndarray]

Returns inferred parameter means as Dict.

property variational_params_np: numpy.ndarray

Returns inferred parameter means as numpy array.

property variational_params_pd: pandas.core.frame.DataFrame

Returns inferred parameter means as pandas DataFrame.

property variational_sample: numpy.ndarray

Returns the set of approximate posterior output draws.

InferenceMetadata

class cmdstanpy.InferenceMetadata(config)[source]

CmdStan configuration and contents of output file parsed out of the Stan CSV file header comments and column headers. Assumes valid CSV files.

Parameters

config (Dict[str, Any]) –

Return type

None

property cmdstan_config: Dict[str, Any]

Returns a dictionary containing a set of name, value pairs parsed out of the Stan CSV file header. These include the command configuration and the CSV file header row information. Uses deepcopy for immutability.

property method_vars_cols: Dict[str, Tuple[int, ...]]

Returns a map from a Stan inference method variable to a tuple of column indices in inference engine’s output array. Method variable names always end in __, e.g. lp__. Uses deepcopy for immutability.

property stan_vars_cols: Dict[str, Tuple[int, ...]]

Returns a map from a Stan program variable name to a tuple of the column indices in the vector or matrix of estimates produced by a CmdStan inference method. Uses deepcopy for immutability.

property stan_vars_dims: Dict[str, Tuple[int, ...]]

Returns map from Stan program variable names to variable dimensions. Scalar types are mapped to the empty tuple, e.g., program variable int foo has dimension () and program variable vector[10] bar has single dimension (10). Uses deepcopy for immutability.

RunSet

class cmdstanpy.stanfit.RunSet(args, chains=4, chain_ids=None, logger=None)[source]

Encapsulates the configuration and results of a call to any CmdStan inference method. Records the method return code and locations of all console, error, and output files.

Parameters
  • args (cmdstanpy.cmdstan_args.CmdStanArgs) –

  • chains (int) –

  • chain_ids (Optional[List[int]]) –

  • logger (Optional[logging.Logger]) –

Return type

None

get_err_msgs()[source]

Checks console messages for each chain.

Return type

str

save_csvfiles(dir=None)[source]

Moves csvfiles to specified directory.

Parameters

dir (Optional[str]) – directory path

Return type

None

property chain_ids: List[int]

Chain ids.

property chains: int

Number of chains.

property cmds: List[List[str]]

List of call(s) to CmdStan, one call per-chain.

property csv_files: List[str]

List of paths to CmdStan output files.

property diagnostic_files: List[str]

List of paths to CmdStan hamiltonian diagnostic files.

property method: cmdstanpy.cmdstan_args.Method

CmdStan method used to generate this fit.

property model: str

Stan model name.

property profile_files: List[str]

List of paths to CmdStan profiler files.

property stderr_files: List[str]

List of paths to CmdStan stderr transcripts.

property stdout_files: List[str]

List of paths to CmdStan stdout transcripts.

Functions

cmdstan_path

cmdstanpy.cmdstan_path()[source]

Validate, then return CmdStan directory path.

Return type

str

install_cmdstan

cmdstanpy.install_cmdstan(version=None, dir=None, overwrite=False, verbose=False, compiler=False)[source]

Download and install a CmdStan release from GitHub by running script install_cmdstan as a subprocess. Downloads the release tar.gz file to temporary storage. Retries GitHub requests in order to allow for transient network outages. Builds CmdStan executables and tests the compiler by building example model bernoulli.stan.

Parameters
  • version (Optional[str]) – CmdStan version string, e.g. “2.24.1”. Defaults to latest CmdStan release.

  • dir (Optional[str]) – Path to install directory. Defaults to hidden directory $HOME/.cmdstan. If no directory is specified and the above directory does not exist, directory $HOME/.cmdstan will be created and populated.

  • overwrite (bool) – Boolean value; when True, will overwrite and rebuild an existing CmdStan installation. Default is False.

  • verbose (bool) – Boolean value; when True, output from CmdStan build processes will be streamed to the console. Default is False.

  • compiler (bool) – Boolean value; when True on WINDOWS ONLY, use the C++ compiler from the install_cxx_toolchain command or install one if none is found.

Returns

Boolean value; True for success.

Return type

bool

set_cmdstan_path

cmdstanpy.set_cmdstan_path(path)[source]

Validate, then set CmdStan directory path.

Parameters

path (str) –

Return type

None

set_make_env

cmdstanpy.set_make_env(make)[source]

set MAKE environmental variable.

Parameters

make (str) –

Return type

None

from_csv

cmdstanpy.from_csv(path=None, method=None)[source]

Instantiate a CmdStan object from a the Stan CSV files from a CmdStan run. CSV files are specified from either a list of Stan CSV files or a single filepath which can be either a directory name, a Stan CSV filename, or a pathname pattern (i.e., a Python glob). The optional argument ‘method’ checks that the CSV files were produced by that method. Stan CSV files from CmdStan methods ‘sample’, ‘optimize’, and ‘variational’ result in objects of class CmdStanMCMC, CmdStanMLE, and CmdStanVB, respectively.

Parameters
  • path (Optional[Union[str, List[str]]]) – directory path

  • method (Optional[str]) – method name (optional)

Returns

either a CmdStanMCMC, CmdStanMLE, or CmdStanVB object

Return type

Optional[Union[cmdstanpy.stanfit.CmdStanMCMC, cmdstanpy.stanfit.CmdStanMLE, cmdstanpy.stanfit.CmdStanVB]]

write_stan_json

cmdstanpy.write_stan_json(path, data)[source]

Dump a mapping of strings to data to a JSON file.

Values can be any numeric type, a boolean (converted to int), or any collection compatible with numpy.asarray(), e.g a pandas.Series.

Produces a file compatible with the Json Format for Cmdstan

Parameters
  • path (str) – File path for the created json. Will be overwritten if already in existence.

  • data (Mapping[str, Any]) – A mapping from strings to values. This can be a dictionary or something more exotic like an xarray.Dataset. This will be copied before type conversion, not modified

Return type

None