Why Stan? The engine behind brms

Bayesian inference

Statistics

brms

Stan

cmdstanr

You’ve been using Stan all along — you might just didn’t know it. In this opening post of the “From brms to Stan” series, I pull back the curtain on what brms actually does under the hood, introduce the anatomy of Stan code, and make the case for why it’s worth learning Stan even if you already love brms (like I do).

Author

Sven De Maeyer

Published

March 23, 2026

Show the code

library(tidyverse)
library(brms)
library(cmdstanr)
library(tidybayes)
library(ggdist)
library(patchwork)

theme_set(theme_linedraw() +
            theme(panel.grid = element_blank()))

# Use cmdstanr as the brms backend (avoids "unable to find rstan" on render)
options(brms.backend = "cmdstanr")

📖 From brms to Stan — Post 1 of 8

This post is part of a series in which I work my way from brms to writing Stan models directly. Each post builds on the previous one, starting from a familiar brms model and gradually revealing the Stan code underneath.

#	Title
1	Why Stan? The engine behind brms ← you are here
2	Your first Stan model: linear regression
3	Priors in Stan
4	Non-normal outcomes: logistic regression
5	Count data: Poisson and Negative Binomial
6	Multilevel models: random intercepts
7	Multilevel models: random slopes
8	Beyond brms: writing your own Stan model

Every time I run a brm() model, there is a brief moment where the console goes quiet and then erupts into a stream of C++ compilation messages. For a long time, I just looked away and waited. Compilation, warmup, sampling — a series of messages that eventually produced posterior draws I could work with. I never really asked what was happening behind the scenes.

If you have used brms for a while, you have probably done the same. And that is completely fine — brms is designed so you do not have to think about what happens underneath the hood code-wise. But there came a point where I wanted to understand the engine. Not because brms was failing me, but because I noticed I could not always diagnose why something looked off, or extend my models beyond what brms natively supports. That is what pushed me toward learning Stan basics.

This series is my attempt to bring you along on that journey. We start from models you already know how to write in brms, and we gradually translate them into Stan — first by letting brms do the translation for us, and then by writing it ourselves. By the end, you will have enough Stan vocabulary to step outside the brms world when you need to.

This first post is mostly about looking at what is already there. We will use a small dataset, fit a model in brms, and then ask brms to show us the Stan code it generated. Then I walk through the structure of a Stan program at a high level. No deep Stan coding yet — that starts in post 2. But by the end of this post, I hope the curtain will feel a little more transparent.

Coffee, slides, and a simple question

Let me introduce a dataset that I will use to anchor this post. Imagine a fictional observational study conducted at a large university. Eighty lecturers were followed on a teaching day, and two things were recorded: how many cups of coffee they drank before and during the morning, and how many slides their presentation contained.

The research question is almost comically simple:

Does coffee consumption predict the number of slides a lecturer prepares?

Let me simulate the data.

Show the code

set.seed(42)

n <- 80

coffee_slides <- tibble(
  lecturer_id = 1:n,
  field       = sample(c("Health sciences", "Humanities", "Social Sciences"),
                       size = n, replace = TRUE,
                       prob = c(0.4, 0.3, 0.3)),
  coffee      = sample(0:8, size = n, replace = TRUE),
  n_slides    = round(30 + 18 * coffee + rnorm(n, 0, 25))
) %>%
  mutate(
    n_slides = pmax(n_slides, 15),  # floor at 15 slides
    field    = factor(field)
  )

A quick look at the data:

Show the code

ggplot(coffee_slides, aes(x = coffee, y = n_slides, colour = field)) +
  geom_jitter(width = 0.15, height = 0, size = 2.5, alpha = 0.8) +
  scale_colour_manual(
    values = c("Health sciences"            = "#D95F02",
               "Humanities"      = "darkgrey",
               "Social Sciences" = "#1B9E77")
  ) +
  labs(
    x      = "Cups of coffee",
    y      = "Number of slides",
    colour = "Field"
  )

Figure 1: Each point is one lecturer. There is a clear positive trend: more coffee, more slides. The color indicates academic field, though we are not modelling that today.

There is a clear positive trend. Whether it is the caffeine that drives slide production, or whether both reflect a kind of anxious over-preparation, is left as an exercise for the reader.

brms as a Stan generator

Let me fit the obvious model: a simple linear regression with n_slides as outcome and coffee as predictor. I use brm() with backend = "cmdstanr", which is the Stan backend we will be working with throughout this series.

Show the code

fit_brms <- brm(
  n_slides ~ coffee,
  data    = coffee_slides,
  family  = gaussian(),
  prior   = c(
    prior(normal(80, 50), class = Intercept),
    prior(normal(0, 30),  class = b),
    prior(exponential(1), class = sigma)
  ),
  backend = "cmdstanr",
  cores   = 4,
  warmup  = 1000,
  iter    = 2000,
  seed    = 42
)

saveRDS(fit_brms, file = "fitted_models/fit_brms.RDS")

Note

The model is estimated once and saved to fitted_models/fit_brms.RDS. When rendering this post, only the readRDS() call runs — no re-fitting.

Show the code

as_draws_df(fit_brms) %>%
  select(b_Intercept, b_coffee, sigma) %>%
  posterior::summarise_draws()

# A tibble: 3 × 10
  variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 b_Intercept  37.4   37.4  4.86 4.80   29.4  45.2  1.00    3507.    2448.
2 b_coffee     16.0   16.0  1.01 0.991  14.3  17.6  1.00    3609.    2987.
3 sigma        24.5   24.4  1.63 1.60   22.0  27.2  1.00    3786.    2600.

So far, completely familiar territory. But now here is the interesting part: let me ask brms to show me the Stan code it generated by wrapping our model definition in a make_stancode( ) call.

Show the code

make_stancode(
  n_slides ~ coffee,
  data    = coffee_slides,
  family  = gaussian(),
  prior   = c(
    prior(normal(80, 50), class = Intercept),
    prior(normal(0, 30),  class = b),
    prior(exponential(1), class = sigma)
  )
)

// generated with brms 2.22.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += normal_lpdf(b | 0, 30);
  lprior += normal_lpdf(Intercept | 80, 50);
  lprior += exponential_lpdf(sigma | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

Take a moment to read through what brms produced. It is verbose — brms generates Stan code that handles many edge cases, naming conventions, and internal bookkeeping that makes the output general-purpose. But the structure is recognizable once you know what to look for, and that is exactly what we are going to learn to read.

Stan works with a data list rather than a data frame as input. You can also inspect the data list that brms would pass to Stan by using the make_standata( ) function:

Show the code

str(make_standata(
  n_slides ~ coffee,
  data   = coffee_slides,
  family = gaussian()
))

List of 6
 $ N         : int 80
 $ Y         : num [1:80(1d)] 59 121 105 202 61 106 136 101 94 87 ...
 $ K         : int 2
 $ Kc        : num 1
 $ X         : num [1:80, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:80] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "Intercept" "coffee"
  ..- attr(*, "assign")= int [1:2] 0 1
 $ prior_only: int 0
 - attr(*, "class")= chr [1:2] "standata" "list"

Notice that brms transforms your tidy data frame into a list of named vectors and scalars — exactly the kind of structured input that Stan expects. This is the data block in action.

The anatomy of a Stan program

Let’s dive into the anatomy of a Stan program.

A Stan program is divided into named blocks. Each block has a specific role, and they are always written in the same order. Here is the skeleton:

data {
  // Everything you pass in from R:
  // observed outcomes, predictors, sample sizes, etc.
}

transformed data {
  // Optional. Compute derived quantities from data
  // (e.g., log-transformations, centering).
  // Runs once before sampling.
}

parameters {
  // The unknowns Stan will sample:
  // regression coefficients, standard deviations, etc.
}

transformed parameters {
  // Optional. Compute derived parameters
  // (e.g., predicted values, link function outputs).
  // Runs at every iteration.
}

model {
  // The heart of the program:
  // prior distributions for parameters,
  // likelihood of the data given the parameters.
}

generated quantities {
  // Optional. Post-sampling calculations:
  // posterior predictive draws, log-likelihood values,
  // any quantity you want tracked across draws.
}

Six blocks. The three you will use in virtually every model are data, parameters, and model. The others enter the picture as your models grow more complex — and we will introduce them one at a time across this series.

Here is how the coffee-slides model maps onto these blocks, in a hand-written version that is much simpler than what brms generates:

data {
  int<lower=0> N;         // number of observations
  vector[N] coffee;       // predictor: cups of coffee
  vector[N] n_slides;     // outcome: number of slides
}

parameters {
  real intercept;           // expected slides at zero coffee
  real beta_coffee;         // slope: change in slides per cup of coffee
  real<lower=0> sigma;      // residual standard deviation (must be positive)
}

model {
  // Priors
  intercept    ~ normal(80, 50);
  beta_coffee  ~ normal(0, 30);
  sigma        ~ exponential(1);

  // Likelihood
  n_slides ~ normal(intercept + beta_coffee * coffee, sigma);
}

This is a complete, runnable Stan model. It is small, but every essential piece is there: data come in, parameters are declared, priors are set, and the likelihood connects them. In post 2, we will go through each of these blocks in detail.

Running our own Stan model

Let me not just show the code, but actually run it — because that is the whole spirit of this series. I save the Stan code to a .stan file and use cmdstanr to compile and sample.

Show the code

stan_code <- "
data {
  int<lower=0> N;
  vector[N] coffee;
  vector[N] n_slides;
}

parameters {
  real intercept;
  real beta_coffee;
  real<lower=0> sigma;
}

model {
  intercept   ~ normal(80, 50);
  beta_coffee ~ normal(0, 30);
  sigma       ~ exponential(1);

  n_slides ~ normal(intercept + beta_coffee * coffee, sigma);
}

generated quantities {
  vector[N] y_rep;
  for (i in 1:N) {
    y_rep[i] = normal_rng(intercept + beta_coffee * coffee[i], sigma);
  }
}
"

writeLines(stan_code, con = "fitted_models/coffee_slides.stan")

# Compile
coffee_model <- cmdstan_model("fitted_models/coffee_slides.stan")

# Prepare data list
stan_data <- list(
  N        = nrow(coffee_slides),
  coffee   = coffee_slides$coffee,
  n_slides = coffee_slides$n_slides
)

# Sample (aka run the model)
fit_stan <- coffee_model$sample(
  data            = stan_data,
  seed            = 42,
  chains          = 4,
  parallel_chains = 4,
  iter_warmup     = 1000,
  iter_sampling   = 1000,
  refresh         = 0
)

fit_stan$save_object(file = "fitted_models/fit_stan.RDS")

Note

The Stan model is compiled and sampled once, then saved to fitted_models/fit_stan.RDS. On render, only the readRDS() call runs.

Show the code

fit_stan$summary(variables = c("intercept", "beta_coffee", "sigma"))

# A tibble: 3 × 10
  variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 intercept    37.9   37.9 4.69  4.63   30.0  45.4  1.00    1742.    1876.
2 beta_coffee  15.9   15.9 0.958 0.946  14.4  17.5  1.00    1694.    1956.
3 sigma        24.3   24.2 1.64  1.59   21.9  27.2  1.00    2114.    1773.

The estimates land in a very familiar place. Let me put them side by side with the brms output for a direct comparison.

Show the code

brms_draws <- as_draws_df(fit_brms) %>%
  select(b_Intercept, b_coffee, sigma) %>%
  summarise(
    across(everything(), list(mean = mean, sd = sd))
  ) %>%
  pivot_longer(everything(),
               names_to  = c("parameter", ".value"),
               names_sep = "_(?=[^_]+$)") %>%
  mutate(source = "brms",
         parameter = recode(parameter,
                            b_Intercept = "intercept",
                            b_coffee    = "beta_coffee"))

stan_draws <- fit_stan$draws(
  variables = c("intercept", "beta_coffee", "sigma"),
  format    = "df"
) %>%
  as_tibble() %>%
  select(intercept, beta_coffee, sigma) %>%
  summarise(across(everything(), list(mean = mean, sd = sd))) %>%
  pivot_longer(everything(),
               names_to  = c("parameter", ".value"),
               names_sep = "_(?=[^_]+$)") %>%
  mutate(source = "Stan")

bind_rows(brms_draws, stan_draws) %>%
  pivot_wider(names_from  = source,
              values_from = c(mean, sd),
              names_glue  = "{source}_{.value}") %>%
  select(parameter, brms_mean, brms_sd, Stan_mean, Stan_sd) %>%
  mutate(across(where(is.numeric), \(x) round(x, 2))) %>%
  knitr::kable(
    col.names = c("Parameter", "brms mean", "brms SD", "Stan mean", "Stan SD")
  )

Table 1: Posterior summaries from brms and Stan. Both models use identical priors and data; the small differences reflect Monte Carlo noise.

Parameter	brms mean	brms SD	Stan mean	Stan SD
intercept	37.39	4.86	37.85	4.69
beta_coffee	16.01	1.01	15.93	0.96
sigma	24.47	1.63	24.33	1.64

The estimates are essentially identical — as they should be, since we specified the same priors and the same likelihood. This is reassuring: our hand-written Stan model is doing exactly what brms was doing all along.

Let me also take a visual look at the posteriors side by side.

Show the code

# Extract draws
brms_long <- as_draws_df(fit_brms) %>%
  select(intercept = b_Intercept, beta_coffee = b_coffee, sigma) %>%
  mutate(source = "brms")

stan_long <- fit_stan$draws(
  variables = c("intercept", "beta_coffee", "sigma"),
  format    = "df"
) %>%
  as_tibble() %>%
  select(intercept, beta_coffee, sigma) %>%
  mutate(source = "Stan")

combined <- bind_rows(brms_long, stan_long) %>%
  pivot_longer(c(intercept, beta_coffee, sigma),
               names_to  = "parameter",
               values_to = "draw")

ggplot(combined,
       aes(x = draw, y = source,
           fill = source, colour = source)) +
  stat_halfeye(alpha = 0.7, .width = c(0.66, 0.95)) +
  scale_fill_manual(values   = c("brms" = "#D95F02", "Stan" = "darkgrey")) +
  scale_colour_manual(values = c("brms" = "#D95F02", "Stan" = "darkgrey")) +
  facet_wrap(~ parameter, scales = "free_x") +
  labs(x = "Parameter value", y = NULL) +
  theme(legend.position = "none")

Figure 2: Posterior distributions for the three model parameters, from brms (orange) and Stan (grey). The distributions overlap almost perfectly.

Perfect overlap. The two approaches are sampling from the same posterior — because they are fitting the same model with the same priors. The Stan code brms generates is just a more complex, general-purpose version of the clean model we wrote ourselves.

Three reasons to learn Stan even if you love brms

Before we go further, I want to be honest about something: for most everyday analyses, brms is perfectly sufficient. So why bother learning Stan? Here are three reasons that convinced me.

Debugging becomes possible. When a brms model behaves oddly — divergent transitions, poor mixing, implausible posteriors — it is very hard to diagnose the problem if you cannot read the Stan code underneath. Once you can read Stan, you can look at the generated code, understand what is actually being estimated, and sometimes spot the issue immediately.

Extension becomes possible. brms is extraordinarily flexible, but it does have limits. Non-linear models with custom parameterizations, joint models, certain kinds of measurement error structures — these require either brms’s custom family interface (which itself requires some Stan knowledge) or writing the model entirely in Stan. Post 8 of this series is entirely about this kind of extension.

Understanding becomes deeper. This is the one I did not expect: learning to write Stan models changed how I think about my brms models. When you have written the likelihood yourself, the prior predictive check stops being a magic function and becomes a thing you understand from the inside.

Getting set up: cmdstanr

Throughout this series, I use cmdstanr to interact with Stan from R. If you have been running brms with backend = "cmdstanr", you already have this installed. If not, the setup is straightforward:

Show the code

# Install cmdstanr from r-universe
install.packages("cmdstanr",
                 repos = c("https://stan-dev.r-universe.dev",
                           getOption("repos")))

# Install Stan itself
cmdstanr::install_cmdstan()

After that, verify everything works:

Show the code

cmdstanr::check_cmdstan_toolchain()
cmdstanr::cmdstan_version()

If those two commands run without complaint, you are ready. I also recommend loading cmdstanr and posterior at the top of any script that does Stan work directly — the posterior package provides convenient tools for working with the draw objects that cmdstanr returns.

What’s coming in this series

Here is a quick map of where we are going:

The next post moves from inspecting Stan code to writing it. We take it one step further and write the full Stan program from scratch — going through each block carefully and running it with cmdstanr. That post is where things start to feel real.

From there, posts 3 through 5 extend the toolkit: we add the generated quantities block (post 3, priors), move to binary and count outcomes (posts 4 and 5), and then posts 6 and 7 tackle multilevel models with random intercepts and slopes. Post 8 steps fully outside brms to fit a non-linear learning curve model that brms simply cannot express.

Each post builds on the last, but each is also readable on its own if you want to jump to a specific topic. I try to flag which Stan concepts are new in each post so it is easy to orient yourself.

Wrap-up

Stan is not a replacement for brms. It is what brms is built on, and learning to read and write it makes you a better brms user as a side effect. In this post we used make_stancode() to peek under the hood, got our first look at the six-block structure of a Stan program, and even ran a hand-written Stan model and confirmed it produces the same posteriors as brms.

In the next post, we start writing Stan models ourselves — block by block, with a clean dataset and no brms safety net.

I am curious whether the make_stancode() output made sense to you at first glance, or whether it felt like a wall of code. That reaction — whichever it was — is exactly the right starting point. Have fun!

Acknowledgement

This post was developed with the help of Claude Sonnet 4.6 (Anthropic) during the ideation and drafting phase. All content has been checked, revised, and rewritten by me.