Visualizing multilevel models estimated in `brms` using hypothetical outcome plots (HOP)

Bayesian inference

Statistics

brms

Visualization

Multilevel models can come alive when you visualize them. In this post we walk through an example to show how to make different interesting hypothetical outcome plots to visualize multilevel models estimated with brms.

Author

Sven De Maeyer

Published

April 4, 2023

Have you ever found yourself in the situation that you are proud that you applied a rather sophisticated multilevel model and that your supervisor or colleague throws the following question to you?

“Ok. But what does your model imply? I mean, can you visualize somehow your model to me?”

And then there you are, confronted with a new challenge, visualizing the model.

Now, let’s complicate things even more: you have applied a Bayesian multilevel analysis making use of the brms package. How to proceed and visualize your model? That’s what I’ll cover in this post. I will show how we can make use of hypothetical outcome plots, made with the use of ggplot2, tidybayesand, and modelr to give a glimpse of what your model actually looks like. If you are unfamiliar with applying a multilevel model the Bayesian way I can refer to another post in this blog where I give a quick intro into this topic: click here.

Before we start, let’s load the packages we need throughout the post:

Show the code

library(tidyverse)
library(ggplot2)
library(gganimate)
library(brms)
library(tidybayes)
library(sjPlot)
library(modelr)

Also, as this post is about visualizing, let’s set a theme for our plots in ggplot2 that we will use throughout our post:

Show the code

theme_set(theme_linedraw() +
            theme(panel.grid = element_blank()))

We all become better writers…

For this post I’ll make use of a rather small dataset that contains longitudinal data of 29 students at primary education that are followed along their first 4 years in primary education and took the same spelling test 6 times: at the end of the first year, at the start of the second year, at the end of the second year, …, at the start of year 4. This dataset is available for you as an OSF repository.

Let’s load the dataset and visually explore the data. We plot the variable month on the x-axis and on the y-axis the variable spelling (being the scores students got on a spelling test) and we create a facet for each of the students.

Show the code

Spelling <- readRDS(
  file = "Spelling.RDS"
)

Spelling %>%
  ggplot(
    aes(
      x = month,
      y = spelling
    )
  ) +
  geom_path() +
  scale_x_continuous(breaks = seq(from = 9, to = 36, by = 3)) +
  facet_wrap(.~student, ncol = 4)

These visualizations help us to define a proper model. What we can see immediately is that spelling scores increase over time for all the students, but not at the same rate for every student. So, we have to model spelling scores as a function of time, with a slope for the effect of time that is not equal for every student.

Also, we can detect negative summer holiday effects for students. For instance, for student 29 we see a clear drop in spelling scores between month 9 (= the end of the first year, before the summer break) and month 12 (= the start of the second year, after the summer break). We might take into account the effect of the variable After_holiday in our model. This variable indicates whether a measure is registered after the summer break or not.

Time to estimate a very naive model

To get started, we will estimate a very naive mutilevel model in brms where we assume that spelling scores are dependent on time, and that students’ growth curves differ in intercept. For this model we make a new time variable that has a sensible 0 value. Rather than using the variable month as it is, we will create a variable that subtracts the value 9 from the variable month so that the intercepts are estimated at the first measurement occasion. This variable will get the name month_9:

Show the code

Spelling <- Spelling %>%
  mutate(
    month_9 = month - 9
  )

Note

The models are estimated and saved before compiling this post, so that it doesn’t slow down the rendering of the website. The code for running the models is printed and the models themselves are saved on the repository via the following link: find the data and stored models here.

In brms code this model looks like this:

Show the code

M1 <- brm(
  spelling ~ month_9 + (1|student),
  data = Spelling
)

Let’s load the saved model results and have quick look at the results:

Show the code

M1 <- readRDS(file = "M1.RDS")
tab_model(M1)

	spelling
Predictors	Estimates	CI (95%)
Intercept	14.82	14.04 – 15.59
month 9	0.40	0.37 – 0.43
Random Effects
σ²	4.30
τ₀₀_student	2.38
ICC	0.36
N _student	29
Observations	174
Marginal R² / Conditional R²	0.715 / 0.811

The intercept is estimated around 14.82 and the effect of month_9 is somewhere between 0.37 and 0.43, meaning that the we expect that every month the spelling score will increase around 0.40 points. Let’s visualize this first piece of information, making use of a hypothetical outcome plot (HOP) for the growth of spelling scores in the population.

What we will use to create this plot is the function data_grid() from the package modelr (Wickham 2022) and the function add_epred_draws() from the package tidybayes (Kay 2022) to generate a tibble that contains 100 probable predicted values for spelling based on the variable month_9 and our first model. We will store this new tibble in an object called Preds_M1:

Show the code

Preds_M1 <- Spelling %>%
  data_grid(
    month_9 = month_9
  ) %>%
  add_epred_draws(
    M1,
    ndraws = 100,
    re_formula = NA
  )

head(Preds_M1)

# A tibble: 6 × 6
# Groups:   month_9, .row [1]
  month_9  .row .chain .iteration .draw .epred
    <dbl> <int>  <int>      <int> <int>  <dbl>
1       0     1     NA         NA     1   14.6
2       0     1     NA         NA     2   14.9
3       0     1     NA         NA     3   14.7
4       0     1     NA         NA     4   14.8
5       0     1     NA         NA     5   14.4
6       0     1     NA         NA     6   15.0

Notice how I used re_formula = NA to neglect the use of the group-level effects (students in our case) in the predictions. This allows us to plot only the ‘fixed part’ of our model.

Looking at this tibble we can see that we have column called month_9 that contains values of this variable in our dataset. Another column is .epred which contains probable predicted values for spelling given that value of the variable month_9. For each value of month_9 we have 100 probable predicted values for spelling, identified by the variable called .draw. This tibble can be used to create a plot now. But first we will create a variable month within this tibble that has original values of our variable.

Show the code

Preds_M1 %>%
  # first create a variable month with the original values
  mutate(
    month = month_9 + 9
  ) %>%
  # define our plot
  ggplot(
    aes(
      x = month,
      # y-axis will contain the predicted scores based on the model
      y = .epred,
      # we group by .draw to generate a line for each of the 100 draws coming
      # from our posterior distribution
      group = .draw
      )
    ) + 
  geom_line(
    alpha = .2
    ) + 
  # change the appearance of the x-axis
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
    ) + 
  labs(
    y = "predicted spelling scores"
    )

By using group = .draw in our aes() call for the ggplot, we make a separate line for each of the 100 draws taken from our posterior probability distribution of both parameters: the intercept and the slope of the effect of month. So, each of the lines plotted above is a hypothetical regression line of the effect of month, demonstrating that we are pretty sure that spelling scores increase with time!

Now, what if we want to plot how we modeled differences between students? Well, we can create another tibble with predicted draws, but adding a variable student within the data_grid() call. Also, compared to the previous example, we now drop the re_formula = NA part within the add_epred_draws() call so that we will take into account the random effect structure of our model. This results in a tibble with 100 possible spelling scores per combination of month_9 and student, based on our model M1.

Show the code

Preds_M1b <- Spelling %>%
  data_grid(
    month_9 = month_9,
    student = student
  ) %>%
  add_epred_draws(
    M1,
    ndraws = 100
  )

head(Preds_M1b)

# A tibble: 6 × 7
# Groups:   month_9, student, .row [1]
  month_9 student  .row .chain .iteration .draw .epred
    <dbl>   <dbl> <int>  <int>      <int> <int>  <dbl>
1       0       1     1     NA         NA     1   14.1
2       0       1     1     NA         NA     2   13.7
3       0       1     1     NA         NA     3   13.6
4       0       1     1     NA         NA     4   13.0
5       0       1     1     NA         NA     5   13.7
6       0       1     1     NA         NA     6   14.8

This new tibble can now be used to create a plot, with almost the same code as for the previous visualization. This time we make use of facet_wrap(.~student, ncol = 4)) to generate facets for each of the students and organise these making use of 4 columns.

Show the code

Preds_M1b %>%
  mutate(
    month = month_9 + 9
  ) %>%
  ggplot(
    aes(
      x = month,
      y = .epred,
      group = .draw
      )
    ) + 
  geom_line(
    alpha = .1
    ) + 
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
    ) + 
  labs(
    y = "predicted spelling scores"
    ) +
  facet_wrap(.~student, 
             ncol = 4)

For each student we have 100 possible growth models for spelling. This plot has some added value, but it is hard to read the differences between the students here.

So, let’s create a plot now with one line per student that summarizes the median of the predicted draws for each student and month combination. Notice how I used the group = student in the aes() call to generate a line for each student.

Show the code

Preds_M1b %>%
  mutate(
    month = month_9 + 9
  ) %>%
  group_by(student, month) %>%
  summarize(
    spelling_predicted = median(.epred)
  ) %>%
  ggplot(
    aes(
      x = month,
      y = spelling_predicted,
      group = student
    )
  ) +
  geom_line(
    alpha = .4
    ) + 
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
    ) +
  labs(
    y = "predicted spelling scores"
  )

Each of the lines above represents a single student. Of course, if we visualize the model this way, we can clearly see how this model is an oversimplification. Do we really expect that all students progress identical in spelling? Time to introduce a slightly more complex model.

Students don’t progress identically

A more realistic model is a model that allows the effect of time to differ from student to student. In brms this is implemented by allowing the effect of month_9 to vary from student to student with the following piece in the formula: (1 + month_9|student).

Show the code

M2 <- brm(
  spelling ~ month_9 + (1 + month_9|student),
  data = Spelling
)

Let’s load the saved model results.

Show the code

M2 <- readRDS(file = "M2.RDS")

The summary of the model looks like this:

Show the code

summary(M2)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: spelling ~ month_9 + (1 + month_9 | student) 
   Data: Spelling (Number of observations: 174) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~student (Number of levels: 29) 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)              1.88      0.36     1.25     2.68 1.01     1165
sd(month_9)                0.12      0.02     0.08     0.17 1.01      710
cor(Intercept,month_9)    -0.51      0.18    -0.79    -0.08 1.00      761
                       Tail_ESS
sd(Intercept)              2106
sd(month_9)                1462
cor(Intercept,month_9)     1500

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    14.81      0.40    14.01    15.62 1.01     1137     2170
month_9       0.40      0.03     0.35     0.45 1.01     1033     1908

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.65      0.11     1.45     1.88 1.00     2175     2483

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

What’s new in this model is the information on the parameter sd(month_9), being an estimate of the standard deviation expressing the differences between students’ residuals towards the general slope. In other words, there is an overall slope estimate (see Population-Level Effects: parameter month_9) and for every student we have estimated how much higher or lower the slope is for that student compared to the overall slope. These differences are called residuals.

To fully understand what we get based on this model, let’s visualize the model with the same steps as we did before. We can plot a line for every individual student!

Show the code

Preds_M2b <- Spelling %>%
  data_grid(
    month_9 = month_9,
    student = student
  ) %>%
  add_epred_draws(
    M2,
    ndraws = 100
  )

Preds_M2b %>%
  mutate(
    month = month_9 + 9
  ) %>%
  group_by(student, month) %>%
  summarize(
    spelling_predicted = median(.epred)
  ) %>%
  ggplot(
    aes(
      x = month,
      y = spelling_predicted,
      group = student
    )
  ) +
  geom_line(
    alpha = .4
    ) + 
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
    ) +
  labs(y = "predicted spelling scores")

Finally we can also visualize hypothetical outcome plots for a selection of students, if we want to highlight the difference between students. In the following code I only generate predicted scores for 4 focus students: student 1, 10, 20 and 29 (see student = c(1,10,20,29) in the code). The resulting graph shows the uncertainty of how the spelling scores increase per student, but it also shows that students 1 and 29 clearly progress slower than the two other students.

Show the code

Preds_M2b <- Spelling %>%
  data_grid(
    month_9 = month_9,
    student = c(1,10,20,29)
  ) %>%
  add_epred_draws(
    M2,
    ndraws = 100
  )

Preds_M2b %>%
  mutate(
    month = month_9 + 9
  ) %>%
  ggplot(
    aes(
      x = month,
      y = .epred,
      group = .draw
    )
  ) +
  geom_path(
    alpha = .2
    ) + 
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
    ) +
  labs(y = "predicted spelling scores")+
  facet_wrap(.~student, 
             ncol = 2)

What about the holiday effect?

In the visual exploration of the data we saw that there might be a holiday effect: spelling scores decrease after a summer break. Moreover, we might expect that this holiday effect is not as strong for every student. If we translate this to our model, this implies that we can add an effect of the variable After_holiday (assuming that observations after a holiday are somewhat lower) and that we allow this effect to vary from student to student. The model code looks like this:

Show the code

M3 <- brm(
  spelling ~ month_9 + After_holiday + (1 + month_9 + After_holiday|student),
  data = Spelling
)

Let’s load the saved model results:

Show the code

M3 <- readRDS(file = "M3.RDS")

To create a plot I can make use of the same tricks we applied earlier on: create predicted scores for each student making use of data_grid() and add_epred_draws(). Notice that I added After_holiday = After_holiday in the data_grid() call as this variable is also incorporated in our model.

Show the code

Preds_M3 <- Spelling %>%
  data_grid(
    month_9 = month_9,
    After_holiday = After_holiday,
    student = student
  ) %>%
  add_epred_draws(
    M3,
    ndraws = 100
  ) %>%
  mutate(
    month = month_9 + 9
  )

Before we jump to the plot, it’s time for a warning! Let’s inspect what we have created in Preds_M3 by making a frequency table of the number of draws for each combination of the variables month and After_holiday:

Show the code

table(
  Preds_M3$month,
  Preds_M3$After_holiday
)

    
        0    1
  9  2900 2900
  12 2900 2900
  21 2900 2900
  24 2900 2900
  33 2900 2900
  36 2900 2900

That’s strange. We have created predicted scores for combinations that are impossible in our design! There shouldn’t be observations for the combination month == 9 and After_holiday == 1 because month 9 is before the holidays. But it’s our own fault that we have these impossible combinations because we asked for them in the data_grid() call where we asked to create a fictitious dataset for all the combinations of our 3 variables! So, in order to create a correct visualization we have to filter out the impossible combinations first. In the following lines of code I first create a variable that indicates correct combinations. Then I filter on that variable.

Show the code

Preds_M3 <- Preds_M3 %>%
  mutate(
    correct = case_when(
      month == 9 & After_holiday == 0 ~ 1,
      month == 9 & After_holiday == 1 ~ 0,
      month == 12 & After_holiday == 0 ~ 0,
      month == 12 & After_holiday == 1 ~ 1,
      month == 21 & After_holiday == 1 ~ 0,
      month == 21 & After_holiday == 0 ~ 1,
      month == 24 & After_holiday == 1 ~ 1,
      month == 24 & After_holiday == 1 ~ 0,
      month == 33 & After_holiday == 0 ~ 1,
      month == 33 & After_holiday == 1 ~ 0,
      month == 36 & After_holiday == 1 ~ 1,
      month == 36 & After_holiday == 0 ~ 0
    )
  ) %>%
  filter(
    correct == 1
  )

And now we know the drill! We can generate our plot with a line for each student.

Show the code

Preds_M3 %>%
  group_by(student, month) %>%
  summarize(
    spelling_predicted = median(.epred)
  ) %>%
  ggplot(
    aes(
      x = month,
      y = spelling_predicted,
      group = student
    )
  ) +
  geom_path(
    alpha = .4
  ) + 
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
  ) +
  labs(y = "predicted spelling scores")

To finish we can also animate our plot to highlight the line of each student, one by one so that the differences between students are more visible.

Show the code

Preds_M3 %>%
  group_by(student, month) %>%
  summarize(
    spelling_predicted = median(.epred)
  ) %>%
  ggplot(
    aes(
      x = month,
      y = spelling_predicted,
      group = student
    )
  ) +
  geom_path(
    color = "#D95F02"
  ) + 
  scale_x_continuous(
    breaks = seq(
      from = 9, 
      to = 36, 
      by = 3)
  ) +
  labs(y = "predicted spelling scores")+
  transition_states(student, 0, 1) +
  shadow_mark(future = TRUE, color = "gray50", alpha = .2)

Wrap-up

Making use of these visualizations we could clearly indicate how our rather complex multilevel models looked like. Of course, each project has it’s own challenges. But I hope that this post might help you to think about visualizing your models to facilitate a clear communication about your findings. Have fun!

References

Kay, Matthew. 2022. “Tidybayes: Tidy Data and Geoms for Bayesian Models.” https://doi.org/10.5281/zenodo.1308151.

Wickham, Hadley. 2022. “Modelr: Modelling Functions That Work with the Pipe.” https://CRAN.R-project.org/package=modelr.