Why Stan? The engine behind brms
You’ve been using Stan all along — you might just didn’t know it. In this opening post of the “From brms to Stan” series, I pull back the curtain on what brms actually does under the hood, introduce the anatomy of Stan code, and make the case for why it’s worth learning Stan even if you already love brms (like I do).
Every time I run a brm() model, there is a brief moment where the console goes quiet and then erupts into a stream of C++ compilation messages. For a long time, I just looked away and waited. Compilation, warmup, sampling — a series of messages that eventually produced posterior draws I could work with. I never really asked what was happening behind the scenes.
If you have used brms for a while, you have probably done the same. And that is completely fine — brms is designed so you do not have to think about what happens underneath the hood code-wise. But there came a point where I wanted to understand the engine. Not because brms was failing me, but because I noticed I could not always diagnose why something looked off, or extend my models beyond what brms natively supports. That is what pushed me toward learning Stan basics.
This series is my attempt to bring you along on that journey. We start from models you already know how to write in brms, and we gradually translate them into Stan — first by letting brms do the translation for us, and then by writing it ourselves. By the end, you will have enough Stan vocabulary to step outside the brms world when you need to.
This first post is mostly about looking at what is already there. We will use a small dataset, fit a model in brms, and then ask brms to show us the Stan code it generated. Then I walk through the structure of a Stan program at a high level. No deep Stan coding yet — that starts in post 2. But by the end of this post, I hope the curtain will feel a little more transparent.
Coffee, slides, and a simple question
Let me introduce a dataset that I will use to anchor this post. Imagine a fictional observational study conducted at a large university. Eighty lecturers were followed on a teaching day, and two things were recorded: how many cups of coffee they drank before and during the morning, and how many slides their presentation contained.
The research question is almost comically simple:
Does coffee consumption predict the number of slides a lecturer prepares?
Let me simulate the data.
Show the code
set.seed(42)
n <- 80
coffee_slides <- tibble(
lecturer_id = 1:n,
field = sample(c("Health sciences", "Humanities", "Social Sciences"),
size = n, replace = TRUE,
prob = c(0.4, 0.3, 0.3)),
coffee = sample(0:8, size = n, replace = TRUE),
n_slides = round(30 + 18 * coffee + rnorm(n, 0, 25))
) %>%
mutate(
n_slides = pmax(n_slides, 15), # floor at 15 slides
field = factor(field)
)A quick look at the data:
Show the code
ggplot(coffee_slides, aes(x = coffee, y = n_slides, colour = field)) +
geom_jitter(width = 0.15, height = 0, size = 2.5, alpha = 0.8) +
scale_colour_manual(
values = c("Health sciences" = "#D95F02",
"Humanities" = "darkgrey",
"Social Sciences" = "#1B9E77")
) +
labs(
x = "Cups of coffee",
y = "Number of slides",
colour = "Field"
)There is a clear positive trend. Whether it is the caffeine that drives slide production, or whether both reflect a kind of anxious over-preparation, is left as an exercise for the reader.
brms as a Stan generator
Let me fit the obvious model: a simple linear regression with n_slides as outcome and coffee as predictor. I use brm() with backend = "cmdstanr", which is the Stan backend we will be working with throughout this series.
Show the code
fit_brms <- brm(
n_slides ~ coffee,
data = coffee_slides,
family = gaussian(),
prior = c(
prior(normal(80, 50), class = Intercept),
prior(normal(0, 30), class = b),
prior(exponential(1), class = sigma)
),
backend = "cmdstanr",
cores = 4,
warmup = 1000,
iter = 2000,
seed = 42
)
saveRDS(fit_brms, file = "fitted_models/fit_brms.RDS")The model is estimated once and saved to fitted_models/fit_brms.RDS. When rendering this post, only the readRDS() call runs — no re-fitting.
Show the code
as_draws_df(fit_brms) %>%
select(b_Intercept, b_coffee, sigma) %>%
posterior::summarise_draws()# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 37.4 37.4 4.86 4.80 29.4 45.2 1.00 3507. 2448.
2 b_coffee 16.0 16.0 1.01 0.991 14.3 17.6 1.00 3609. 2987.
3 sigma 24.5 24.4 1.63 1.60 22.0 27.2 1.00 3786. 2600.
So far, completely familiar territory. But now here is the interesting part: let me ask brms to show me the Stan code it generated by wrapping our model definition in a make_stancode( ) call.
Show the code
make_stancode(
n_slides ~ coffee,
data = coffee_slides,
family = gaussian(),
prior = c(
prior(normal(80, 50), class = Intercept),
prior(normal(0, 30), class = b),
prior(exponential(1), class = sigma)
)
)// generated with brms 2.22.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
}
transformed parameters {
real lprior = 0; // prior contributions to the log posterior
lprior += normal_lpdf(b | 0, 30);
lprior += normal_lpdf(Intercept | 80, 50);
lprior += exponential_lpdf(sigma | 1);
}
model {
// likelihood including constants
if (!prior_only) {
target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
}
// priors including constants
target += lprior;
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
Take a moment to read through what brms produced. It is verbose — brms generates Stan code that handles many edge cases, naming conventions, and internal bookkeeping that makes the output general-purpose. But the structure is recognizable once you know what to look for, and that is exactly what we are going to learn to read.
Stan works with a data list rather than a data frame as input. You can also inspect the data list that brms would pass to Stan by using the make_standata( ) function:
Show the code
str(make_standata(
n_slides ~ coffee,
data = coffee_slides,
family = gaussian()
))List of 6
$ N : int 80
$ Y : num [1:80(1d)] 59 121 105 202 61 106 136 101 94 87 ...
$ K : int 2
$ Kc : num 1
$ X : num [1:80, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:80] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "Intercept" "coffee"
..- attr(*, "assign")= int [1:2] 0 1
$ prior_only: int 0
- attr(*, "class")= chr [1:2] "standata" "list"
Notice that brms transforms your tidy data frame into a list of named vectors and scalars — exactly the kind of structured input that Stan expects. This is the data block in action.
The anatomy of a Stan program
Let’s dive into the anatomy of a Stan program.
A Stan program is divided into named blocks. Each block has a specific role, and they are always written in the same order. Here is the skeleton:
data {
// Everything you pass in from R:
// observed outcomes, predictors, sample sizes, etc.
}
transformed data {
// Optional. Compute derived quantities from data
// (e.g., log-transformations, centering).
// Runs once before sampling.
}
parameters {
// The unknowns Stan will sample:
// regression coefficients, standard deviations, etc.
}
transformed parameters {
// Optional. Compute derived parameters
// (e.g., predicted values, link function outputs).
// Runs at every iteration.
}
model {
// The heart of the program:
// prior distributions for parameters,
// likelihood of the data given the parameters.
}
generated quantities {
// Optional. Post-sampling calculations:
// posterior predictive draws, log-likelihood values,
// any quantity you want tracked across draws.
}Six blocks. The three you will use in virtually every model are data, parameters, and model. The others enter the picture as your models grow more complex — and we will introduce them one at a time across this series.
Here is how the coffee-slides model maps onto these blocks, in a hand-written version that is much simpler than what brms generates:
data {
int<lower=0> N; // number of observations
vector[N] coffee; // predictor: cups of coffee
vector[N] n_slides; // outcome: number of slides
}
parameters {
real intercept; // expected slides at zero coffee
real beta_coffee; // slope: change in slides per cup of coffee
real<lower=0> sigma; // residual standard deviation (must be positive)
}
model {
// Priors
intercept ~ normal(80, 50);
beta_coffee ~ normal(0, 30);
sigma ~ exponential(1);
// Likelihood
n_slides ~ normal(intercept + beta_coffee * coffee, sigma);
}This is a complete, runnable Stan model. It is small, but every essential piece is there: data come in, parameters are declared, priors are set, and the likelihood connects them. In post 2, we will go through each of these blocks in detail.
Running our own Stan model
Let me not just show the code, but actually run it — because that is the whole spirit of this series. I save the Stan code to a .stan file and use cmdstanr to compile and sample.
Show the code
stan_code <- "
data {
int<lower=0> N;
vector[N] coffee;
vector[N] n_slides;
}
parameters {
real intercept;
real beta_coffee;
real<lower=0> sigma;
}
model {
intercept ~ normal(80, 50);
beta_coffee ~ normal(0, 30);
sigma ~ exponential(1);
n_slides ~ normal(intercept + beta_coffee * coffee, sigma);
}
generated quantities {
vector[N] y_rep;
for (i in 1:N) {
y_rep[i] = normal_rng(intercept + beta_coffee * coffee[i], sigma);
}
}
"
writeLines(stan_code, con = "fitted_models/coffee_slides.stan")
# Compile
coffee_model <- cmdstan_model("fitted_models/coffee_slides.stan")
# Prepare data list
stan_data <- list(
N = nrow(coffee_slides),
coffee = coffee_slides$coffee,
n_slides = coffee_slides$n_slides
)
# Sample (aka run the model)
fit_stan <- coffee_model$sample(
data = stan_data,
seed = 42,
chains = 4,
parallel_chains = 4,
iter_warmup = 1000,
iter_sampling = 1000,
refresh = 0
)
fit_stan$save_object(file = "fitted_models/fit_stan.RDS")The Stan model is compiled and sampled once, then saved to fitted_models/fit_stan.RDS. On render, only the readRDS() call runs.
Show the code
fit_stan$summary(variables = c("intercept", "beta_coffee", "sigma"))# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 intercept 37.9 37.9 4.69 4.63 30.0 45.4 1.00 1742. 1876.
2 beta_coffee 15.9 15.9 0.958 0.946 14.4 17.5 1.00 1694. 1956.
3 sigma 24.3 24.2 1.64 1.59 21.9 27.2 1.00 2114. 1773.
The estimates land in a very familiar place. Let me put them side by side with the brms output for a direct comparison.
Show the code
brms_draws <- as_draws_df(fit_brms) %>%
select(b_Intercept, b_coffee, sigma) %>%
summarise(
across(everything(), list(mean = mean, sd = sd))
) %>%
pivot_longer(everything(),
names_to = c("parameter", ".value"),
names_sep = "_(?=[^_]+$)") %>%
mutate(source = "brms",
parameter = recode(parameter,
b_Intercept = "intercept",
b_coffee = "beta_coffee"))
stan_draws <- fit_stan$draws(
variables = c("intercept", "beta_coffee", "sigma"),
format = "df"
) %>%
as_tibble() %>%
select(intercept, beta_coffee, sigma) %>%
summarise(across(everything(), list(mean = mean, sd = sd))) %>%
pivot_longer(everything(),
names_to = c("parameter", ".value"),
names_sep = "_(?=[^_]+$)") %>%
mutate(source = "Stan")
bind_rows(brms_draws, stan_draws) %>%
pivot_wider(names_from = source,
values_from = c(mean, sd),
names_glue = "{source}_{.value}") %>%
select(parameter, brms_mean, brms_sd, Stan_mean, Stan_sd) %>%
mutate(across(where(is.numeric), \(x) round(x, 2))) %>%
knitr::kable(
col.names = c("Parameter", "brms mean", "brms SD", "Stan mean", "Stan SD")
)| Parameter | brms mean | brms SD | Stan mean | Stan SD |
|---|---|---|---|---|
| intercept | 37.39 | 4.86 | 37.85 | 4.69 |
| beta_coffee | 16.01 | 1.01 | 15.93 | 0.96 |
| sigma | 24.47 | 1.63 | 24.33 | 1.64 |
The estimates are essentially identical — as they should be, since we specified the same priors and the same likelihood. This is reassuring: our hand-written Stan model is doing exactly what brms was doing all along.
Let me also take a visual look at the posteriors side by side.
Show the code
# Extract draws
brms_long <- as_draws_df(fit_brms) %>%
select(intercept = b_Intercept, beta_coffee = b_coffee, sigma) %>%
mutate(source = "brms")
stan_long <- fit_stan$draws(
variables = c("intercept", "beta_coffee", "sigma"),
format = "df"
) %>%
as_tibble() %>%
select(intercept, beta_coffee, sigma) %>%
mutate(source = "Stan")
combined <- bind_rows(brms_long, stan_long) %>%
pivot_longer(c(intercept, beta_coffee, sigma),
names_to = "parameter",
values_to = "draw")
ggplot(combined,
aes(x = draw, y = source,
fill = source, colour = source)) +
stat_halfeye(alpha = 0.7, .width = c(0.66, 0.95)) +
scale_fill_manual(values = c("brms" = "#D95F02", "Stan" = "darkgrey")) +
scale_colour_manual(values = c("brms" = "#D95F02", "Stan" = "darkgrey")) +
facet_wrap(~ parameter, scales = "free_x") +
labs(x = "Parameter value", y = NULL) +
theme(legend.position = "none")Perfect overlap. The two approaches are sampling from the same posterior — because they are fitting the same model with the same priors. The Stan code brms generates is just a more complex, general-purpose version of the clean model we wrote ourselves.
Three reasons to learn Stan even if you love brms
Before we go further, I want to be honest about something: for most everyday analyses, brms is perfectly sufficient. So why bother learning Stan? Here are three reasons that convinced me.
Debugging becomes possible. When a brms model behaves oddly — divergent transitions, poor mixing, implausible posteriors — it is very hard to diagnose the problem if you cannot read the Stan code underneath. Once you can read Stan, you can look at the generated code, understand what is actually being estimated, and sometimes spot the issue immediately.
Extension becomes possible. brms is extraordinarily flexible, but it does have limits. Non-linear models with custom parameterizations, joint models, certain kinds of measurement error structures — these require either brms’s custom family interface (which itself requires some Stan knowledge) or writing the model entirely in Stan. Post 8 of this series is entirely about this kind of extension.
Understanding becomes deeper. This is the one I did not expect: learning to write Stan models changed how I think about my brms models. When you have written the likelihood yourself, the prior predictive check stops being a magic function and becomes a thing you understand from the inside.
Getting set up: cmdstanr
Throughout this series, I use cmdstanr to interact with Stan from R. If you have been running brms with backend = "cmdstanr", you already have this installed. If not, the setup is straightforward:
Show the code
# Install cmdstanr from r-universe
install.packages("cmdstanr",
repos = c("https://stan-dev.r-universe.dev",
getOption("repos")))
# Install Stan itself
cmdstanr::install_cmdstan()After that, verify everything works:
Show the code
cmdstanr::check_cmdstan_toolchain()
cmdstanr::cmdstan_version()If those two commands run without complaint, you are ready. I also recommend loading cmdstanr and posterior at the top of any script that does Stan work directly — the posterior package provides convenient tools for working with the draw objects that cmdstanr returns.
What’s coming in this series
Here is a quick map of where we are going:
The next post moves from inspecting Stan code to writing it. We take it one step further and write the full Stan program from scratch — going through each block carefully and running it with cmdstanr. That post is where things start to feel real.
From there, posts 3 through 5 extend the toolkit: we add the generated quantities block (post 3, priors), move to binary and count outcomes (posts 4 and 5), and then posts 6 and 7 tackle multilevel models with random intercepts and slopes. Post 8 steps fully outside brms to fit a non-linear learning curve model that brms simply cannot express.
Each post builds on the last, but each is also readable on its own if you want to jump to a specific topic. I try to flag which Stan concepts are new in each post so it is easy to orient yourself.
Wrap-up
Stan is not a replacement for brms. It is what brms is built on, and learning to read and write it makes you a better brms user as a side effect. In this post we used make_stancode() to peek under the hood, got our first look at the six-block structure of a Stan program, and even ran a hand-written Stan model and confirmed it produces the same posteriors as brms.
In the next post, we start writing Stan models ourselves — block by block, with a clean dataset and no brms safety net.
I am curious whether the make_stancode() output made sense to you at first glance, or whether it felt like a wall of code. That reaction — whichever it was — is exactly the right starting point. Have fun!
Acknowledgement
This post was developed with the help of Claude Sonnet 4.6 (Anthropic) during the ideation and drafting phase. All content has been checked, revised, and rewritten by me.