This guide demonstrates how to perform within- and between-subject centering of variables in R. This is an important step in prepping data for a multilevel analysis, as there are both within- and between-subject differences that need to be separated.
To complete this demo, we will use the Bolger & Laurenceau
Chapter 9 data, which we can access through the bmlm
package. This dataset already contains within-person centered variables,
but we will compute them ourselves anyway.
library(bmlm)
library(dplyr)
d <- BLch9
The first method involves using functions available in base R. With
the within
approach, we will create a new variable that
draws on information nested in a particular group, id
in
this case. We then specify the function we want to apply, in this case
taking the mean. Note that if our dataset had missing values
(NA
), we might want to define a new function ourselves for
taking the mean but excluding the missing values.
d <- within(d, {fwkstrs_mean1 = ave(fwkstrs, id, FUN = mean)})
d$fwkstrs_cw1 <- d$fwkstrs - d$fwkstrs_mean1
d$fwkstrs_cb1 <- scale(d$fwkstrs, center = T, scale = F) - d$fwkstrs_cw1
d <- within(d, {fwkdis_mean1 = ave(fwkdis, id, FUN = mean)})
d$fwkdis_cw1 <- d$fwkdis - d$fwkdis_mean1
d$fwkdis_cb1 <- scale(d$fwkdis, center = T, scale = F) - d$fwkdis_cw1
Now, let’s check our work (for the sake of space we will just look at
the fwkstrs
variable):
head(dplyr::select(d, id, time, fwkstrs, fwkstrs_cw1, fwkstrs_cb1))
## id time fwkstrs fwkstrs_cw1 fwkstrs_cb1
## 1 101 1 3 0.3333333 -0.3019048
## 2 101 2 3 0.3333333 -0.3019048
## 3 101 3 3 0.3333333 -0.3019048
## 4 101 4 4 1.3333333 -0.3019048
## 5 101 5 1 -1.6666667 -0.3019048
## 6 101 6 2 -0.6666667 -0.3019048
dplyr
A second method involves using the dplyr
package. With
the group_by
function, we can ask R to take the mean for
each “group”, which in this case is each person.
In the first part of the code, we will group our dataframe by
id
. This will effectively treat each “group” (person) as
their own dataframe. By taking the mean, we will generate a
person-specific mean. This can be accomplished using
mutate()
, which is a function that allows you to create new
variables. For details, run ?mutate
. We will obtain
person-specific means for two variables: fwkstrs
and
fwkdis
.
Next, we need to ungroup()
so that our data manipulation
will be on the whole dataframe, and not on the data broken up by
id
.
We will then use mutate()
again to create three new sets
of variables: (A) Grand-mean centered versions of each of our variables,
(B) Within-person centered versions of each of our variables, and (C)
Between-person centered versions of each of our varaibles.
Grand-mean centered: We will use the scale()
function, which is a base R function for centering and standardizing
variables. Because we want to keep our variable in the original units
(i.e., we do NOT want standardized versions), we will set
center = T
and scale = F
within the function.
This will simply subtract out the grand mean (mean of all observations)
from each individual observation in the dataframe.
Within-person centered: Next, we will create within-person centered variables, which capture fluctuations relative to each person’s own average across the study period. Do to this, we will simply take the “raw” observation for each variable and subtract out the person-specific mean we computed in the first part.
Between-person centered: Finally, we will create between-person centered variables, which reflect between-person differences in level across the study period (i.e., whether a particular subject reported generally high vs. low levels of each variable across the study period). As noted in Chapter 5 of Bolger & Laurenceau (2013), within-person centered value + between-person centered value = grand mean centered value. Therefore, we can obtain between-person centered values by subtracting within-person centered values from grand mean centered values.
d <- d %>% group_by(id) %>%
mutate(
fwkstrs_mean2 = mean(fwkstrs, na.rm = T),
fwkdis_mean2 = mean(fwkdis, na.rm = T)
) %>% ungroup() %>%
mutate(
fwkstrs_c2 = scale(fwkstrs, center = T, scale = F),
fwkdis_c2 = scale(fwkdis, center = T, scale = F),
fwkstrs_cw2 = fwkstrs - fwkstrs_mean2,
fwkdis_cw2 = fwkdis - fwkdis_mean2,
fwkstrs_cb2 = fwkstrs_c2 - fwkstrs_cw2,
fwkdis_cb2 = fwkdis_c2 - fwkdis_cw2
)
Now, let’s check our work:
head(dplyr::select(d, id, time, fwkstrs, fwkstrs_cw2, fwkstrs_cb2))
## # A tibble: 6 × 5
## id time fwkstrs fwkstrs_cw2 fwkstrs_cb2[,1]
## <int> <int> <int> <dbl> <dbl>
## 1 101 1 3 0.333 -0.302
## 2 101 2 3 0.333 -0.302
## 3 101 3 3 0.333 -0.302
## 4 101 4 4 1.33 -0.302
## 5 101 5 1 -1.67 -0.302
## 6 101 6 2 -0.667 -0.302
bmlm
Helper FunctionA third option is to use the isolate
function from the
bmlm
package. This will accomplish your centering needs in
a single step.
As shown in the code below, we simply need to enter our dataframe
(d
), the name of our grouping variable (id
),
and the variables we want to center (fwkstrs
and
fwkdis
). The which
argument refers to whether
you want the function to return only within-person centered values, only
between-person centered values, or both. Here, we have specified
both
.
d <- isolate(d, by = "id",
value = c("fwkstrs", "fwkdis"),
which = "both")
Now, let’s check our work (note that isolate
does not
provide the grand mean version):
head(dplyr::select(d, id, time, fwkstrs, fwkstrs_cw, fwkstrs_cb))
## # A tibble: 6 × 5
## id time fwkstrs fwkstrs_cw fwkstrs_cb
## <int> <int> <int> <dbl> <dbl>
## 1 101 1 3 0.333 -0.302
## 2 101 2 3 0.333 -0.302
## 3 101 3 3 0.333 -0.302
## 4 101 4 4 1.33 -0.302
## 5 101 5 1 -1.67 -0.302
## 6 101 6 2 -0.667 -0.302
As you can see, there are multiple ways of getting us to the same place. I don’t think any single method is necessarily “best”, although a case could be made for the utility of Method 3, as a prepared function can help eliminate coding errors. However, there may be cases when one needs to center on a value other than the mean (e.g., baseline). In such cases, the other methods may provide the flexibility necessary for doing so.
View .Rmd source code
updated April 22, 2019
The material above reflects the best of my knowledge on this topic. Please be sure to check your results and code carefully.