Getting Started


Install package CmdStanPy

CmdStanPy is a pure-Python package which can be installed from PyPI

pip install --upgrade cmdstanpy

or from GitHub

pip install -e git+

To install CmdStanPy with all the optional packages (ujson; json processing, tqdm; progress bar)

pip install --upgrade cmdstanpy[all]

Note for PyStan users: PyStan and CmdStanPy should be installed in separate environments. If you already have PyStan installed, you should take care to install CmdStanPy in its own virtual environment.

User can install optional packages with pip with the CmdStanPy installation

pip install --upgrade cmdstanpy[all]

The optional packages are

  • ujson which provides faster IO
  • tqdm which displays a progress during sampling

To install these manually

pip install ujson
pip install tqdm

Install CmdStan

CmdStanPy requires a local install of CmdStan.


CmdStanPy requires an installed C++ toolchain.

Function install_cmdstan

CmdStanPy provides the function install_cmdstan which downloads CmdStan from GitHub and builds the CmdStan utilities. It can be can be called from within Python or from the command line. By default it installs the latest version of CmdStan into a directory named .cmdstanpy in your $HOME directory:

  • From Python
import cmdstanpy
  • From the command line on Linux or MacOSX
ls -F ~/.cmdstanpy
  • On Windows
python -m cmdstanpy.install_cmdstan
dir "%HOME%/.cmdstanpy"

The named arguments: -d <directory> and -v <version> can be used to override these defaults:

install_cmdstan -d my_local_cmdstan -v 2.20.0
ls -F my_local_cmdstan
Specifying CmdStan installation location

The default for the CmdStan installation location is a directory named .cmdstanpy in your $HOME directory.

If you have installed CmdStan in a different directory, then you can set the environment variable CMDSTAN to this location and it will be picked up by CmdStanPy:

export CMDSTAN='/path/to/cmdstan-2.20.0'

The CmdStanPy commands cmdstan_path and set_cmdstan_path get and set this environment variable:

from cmdstanpy import cmdstan_path, set_cmdstan_path

oldpath = cmdstan_path()
newpath = cmdstan_path()
Specifying a custom make tool

To use custom make-tool use set_make_env function.

from cmdstanpy import set_make_env
set_make_env("mingw32-make.exe") # On Windows with mingw32-make

“Hello, World”

Bayesian estimation via Stan’s HMC-NUTS sampler

To exercise the essential functions of CmdStanPy, we will compile the example Stan model bernoulli.stan, which is distributed with CmdStan and then fit the model to example data, also distributed with CmdStan using Stan’s HMC-NUTS sampler in order to estimate the posterior probability of the model parameters conditioned on the data.

Specify a Stan model

The CmdStanModel class specifies the Stan program and its corresponding compiled executable. By default, the Stan program is compiled on instantiation.

import os
from cmdstanpy import cmdstan_path, CmdStanModel

bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')
bernoulli_model = CmdStanModel(stan_file=bernoulli_stan)

The CmdStanModel class provides properties and functions to inspect the model code and filepaths.

Run the HMC-NUTS sampler

The CmdStanModel method sample runs the Stan HMC-NUTS sampler on the model and data and returns a CmdStanMCMC object:

bernoulli_data = { "N" : 10, "y" : [0,1,0,0,0,0,0,0,0,1] }
bern_fit = bernoulli_model.sample(data=bernoulli_data, output_dir='.')

By default, the sample command runs 4 sampler chains. The output_dir argument specifies the path to the sampler output files. If no output file path is specified, the sampler outputs are written to a temporary directory which is deleted when the current Python session is terminated.

The CmdStanMLE object records the command, the return code, and the paths to the optimize method output csv and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.

Output filenames are composed of the model name, a timestamp in the form YYYYMMDDhhmm and the chain id, plus the corresponding filetype suffix, either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. bernoulli-201912081451-1.csv. Output files written to the temporary directory contain an additional 8-character random string, e.g. bernoulli-201912081451-1-5nm6as7u.csv.

Access the sample

The sample command returns a CmdStanMCMC object which provides methods to retrieve the sampler outputs, the arguments used to run Cmdstan, and names of the the per-chain stan-csv output files, and per-chain console messages files.


The resulting sample from the posterior is lazily instantiated the first time that any of the properties sample, metric, or stepsize are accessed. At this point the stan-csv output files are read into memory. For large files this may take several seconds; for the example dataset, this should take less than a second. The sample property of the CmdStanMCMC object is a 3-D numpy.ndarray (i.e., a multi-dimensional array) which contains the set of all draws from all chains arranged as dimensions: (draws, chains, columns).


The get_drawset method returns the draws from all chains as a pandas.DataFrame, one draw per row, one column per model parameter, transformed parameter, generated quantity variable. The params argument is used to restrict the DataFrame columns to just the specified parameter names.


Python’s index slicing operations can be used to access the information by chain. For example, to select all draws and all output columns from the first chain, we specify the chain index (2nd index dimension). As arrays indexing starts at 0, the index ‘0’ corresponds to the first chain in the CmdStanMCMC:

chain_1 = bern_fit.sample[:,0,:]
chain_1.shape       # (1000, 8)
chain_1[0]          # sample first draw:
                    # array([-7.99462  ,  0.578072 ,  0.955103 ,  2.       ,  7.       ,
                    # 0.       ,  9.44788  ,  0.0934208])

Summarize or save the results

CmdStan is distributed with a posterior analysis utility stansummary that reads the outputs of all chains and computes summary statistics on the model fit for all parameters. The CmdStanMCMC method summary runs the CmdStan stansummary utility and returns the output as a pandas.DataFrame:


CmdStan is distributed with a second posterior analysis utility diagnose that reads the outputs of all chains and checks for the following potential problems:

  • Transitions that hit the maximum treedepth
  • Divergent transitions
  • Low E-BFMI values (sampler transitions HMC potential energy)
  • Low effective sample sizes
  • High R-hat values

The CmdStanMCMC method diagnose runs the CmdStan diagnose utility and prints the output to the console.


The sampler output files are written to a temporary directory which is deleted upon session exit unless the output_dir argument is specified. The save_csvfiles function moves the CmdStan csv output files to a specified directory without having to re-run the sampler.