#Overview In this demo, I will show how to model intensive longitudinal data of distinguishable dyads using a Bayesian approach.
For those less familiar with dyadic analysis, it is an extremely interesting and important type of analysis that is useful for cases where we are interested in examining multiple outcomes with meaningful grouping. A classic example of this is the case of married couples, where X and Y variables are measured repeatedly for each partner in a couple. As may seem obvious, when working with individuals who are grouped in dyads, we cannot treat the individuals in the dyad as being separate individuals. They are nonindependent and by virtue of their relationship, they are going to show associations in their data that would not be expected for two randomly paired individuals.
The “distinguishable” part of distinguishable dyads indicates that the individuals within a dyad have some differentiating feature that they carry with them for the duration of the study. Returning to the earlier example, heterosexual married couples are an example of a distinguishable dyad: There is a wife and a husband, and these roles are fixed throughout the study. Parent and child would be another example, or supervisor and employee.
The analysis of dyadic intensive longitudinal data, such as those collected from daily diary studies or laboratory studies with intensively sampled measurements, is discussed in the book Intensive Longitudinal Methods: An Introduction to Diary and Experience Sampling Research by Niall Bolger and J-P Laurenceau (2013). The purpose of this demo is to show how their example analysis from Chapter 8 can be implemented in a Bayesian framework using the brms package in R.
brms offers some key advantages to this kind of analysis.
First, as discussed in Chapter 8, most multilevel software programs can
only handle a single outcome. But, dyadic data are multivariate: There
is an outcome measure for each partner. Typically, this issue can be
circumvented by stacking one’s dataset and using dummy coded variables
to estimate effects separately for each dyad member within one model.
brms can handle true multivariate models, however, which
bypasses the need for stacking the data, thereby saving some time at the
data processing stage. Second, currently available R packages for
multilevel modeling using maximum likelihood estimation do not presently
allow one to estimate all features of a dyadic multilevel model. This
comes in in the error structures. As discussed in Chapter 8, it is
advisable to use what is known as an UN@AR(1) error structure, which
combines both an autocorrelation part and a cross-partner correlation
part on the residuals. The nlme
package can estimate an
AR(1) error structure OR correlated residuals between dyad members, but
not both. The lme4
package does not allow not allow the
user to specify error structures. brms
, however, can do
both. And as we will see in a moment, it is also advantageous because it
can generate separate AR(1) estimates for each partner; to my knowledge,
this is not currently supported in nlme
.
Now that we have some background, on to the analysis!
We will draw on the example from Chapter 8 of Bolger & Laurenceau’s wonderful book.
#Load Packages
library(dplyr)
library(data.table)
library(brms)
We can pull in the data directly from the book’s website.
c <- as.data.frame(
fread("curl http://www.intensivelongitudinal.com/ch8/ch8R.zip | tar -xf- --to-stdout *dyads.csv")
)
head(c)
## coupleid personid time time7c gender female male reldis wrkstrs
## 1 1 1 0 -1.5000000 1 1 0 3.03 3
## 2 1 1 1 -1.3571429 1 1 0 4.62 3
## 3 1 1 2 -1.2142857 1 1 0 2.85 3
## 4 1 1 3 -1.0714286 1 1 0 6.40 4
## 5 1 1 4 -0.9285714 1 1 0 2.54 1
## 6 1 1 5 -0.7857143 1 1 0 5.16 2
## wrkstrsc wrkstrscb wrkstrscw
## 1 0.0095238 -0.3238095 0.3333333
## 2 0.0095238 -0.3238095 0.3333333
## 3 0.0095238 -0.3238095 0.3333333
## 4 1.0095238 -0.3238095 1.3333333
## 5 -1.9904762 -0.3238095 -1.6666667
## 6 -0.9904762 -0.3238095 -0.6666667
As discussed above, brms
can estimate “true”
multivariate models, so there is no need to stack the dataset and use
dummies for female and male, as with nlme
and
lme4
. But, we still need to do some pre-processing of the
data for use with brms: The data needs to be arranged such that
there is one row per timepoint per dyad, with separate columns for male
and female variables
##A. Subset out female partner observations and rename columns with prefix “f_” First, we will subset out the data for the female partner. We will then relabel the columns, adding the prefix “f_” to most variable names to indicate that they refer to the female partner.
ch8_f <- subset(c, female == 1)
colnames(ch8_f) <- c("coupleid", "f_personid", "time",
"time7c", "gender", "female", "male", "f_reldis",
"f_wrkstrs", "f_wrkstrsc", "f_wrkstrscb", "f_wrkstrscw")
head(ch8_f)
## coupleid f_personid time time7c gender female male f_reldis f_wrkstrs
## 1 1 1 0 -1.5000000 1 1 0 3.03 3
## 2 1 1 1 -1.3571429 1 1 0 4.62 3
## 3 1 1 2 -1.2142857 1 1 0 2.85 3
## 4 1 1 3 -1.0714286 1 1 0 6.40 4
## 5 1 1 4 -0.9285714 1 1 0 2.54 1
## 6 1 1 5 -0.7857143 1 1 0 5.16 2
## f_wrkstrsc f_wrkstrscb f_wrkstrscw
## 1 0.0095238 -0.3238095 0.3333333
## 2 0.0095238 -0.3238095 0.3333333
## 3 0.0095238 -0.3238095 0.3333333
## 4 1.0095238 -0.3238095 1.3333333
## 5 -1.9904762 -0.3238095 -1.6666667
## 6 -0.9904762 -0.3238095 -0.6666667
##B. subset out male partner observations and rename columns with
prefix “m_”, also remove duplicated columns that we don’t need We will
do something similar for the male partner’s data. Note that we have also
used the pipe (%>%
) from the dplyr
package
to also indicate that we want to drop the variables time7c
,
gender
, male
, and female
from our
dataset. The reason for dropping these variables is that we are going to
bind the columns from the female partners’ data and the columns from the
male partners’ data together in a next step. This would lead to
duplicated columns that we don’t actually need for our analysis.
ch8_m <- subset(c, male == 1) %>% dplyr::select(-time7c, -gender, -male, -female)
colnames(ch8_m) <- c("coupleid", "m_personid", "time",
"m_reldis", "m_wrkstrs", "m_wrkstrsc", "m_wrkstrscb", "m_wrkstrscw")
head(ch8_m)
## coupleid m_personid time m_reldis m_wrkstrs m_wrkstrsc m_wrkstrscb
## 22 1 2 0 4.46 3 0.0095238 -0.1333333
## 23 1 2 1 4.88 3 0.0095238 -0.1333333
## 24 1 2 2 4.58 3 0.0095238 -0.1333333
## 25 1 2 3 4.49 1 -1.9904762 -0.1333333
## 26 1 2 4 5.04 3 0.0095238 -0.1333333
## 27 1 2 5 4.87 3 0.0095238 -0.1333333
## m_wrkstrscw
## 22 0.1428571
## 23 0.1428571
## 24 0.1428571
## 25 -1.8571429
## 26 0.1428571
## 27 0.1428571
##C. Merge dataframes with female data and male data. Next, we will
merge the two partial datasets together–the data for female partners and
for the male partners. We will merge the data by coupleid
,
which is shared by couple members, and by time
, to preserve
information about the time dependency.
ch8 <- merge(ch8_f, ch8_m, by = c("coupleid", "time"))
head(ch8)
## coupleid time f_personid time7c gender female male f_reldis f_wrkstrs
## 1 1 0 1 -1.50000000 1 1 0 3.03 3
## 2 1 1 1 -1.35714286 1 1 0 4.62 3
## 3 1 10 1 -0.07142857 1 1 0 5.70 2
## 4 1 11 1 0.07142857 1 1 0 5.89 1
## 5 1 12 1 0.21428571 1 1 0 4.90 3
## 6 1 13 1 0.35714286 1 1 0 4.86 4
## f_wrkstrsc f_wrkstrscb f_wrkstrscw m_personid m_reldis m_wrkstrs m_wrkstrsc
## 1 0.0095238 -0.3238095 0.3333333 2 4.46 3 0.0095238
## 2 0.0095238 -0.3238095 0.3333333 2 4.88 3 0.0095238
## 3 -0.9904762 -0.3238095 -0.6666667 2 4.36 2 -0.9904762
## 4 -1.9904762 -0.3238095 -1.6666667 2 4.91 3 0.0095238
## 5 0.0095238 -0.3238095 0.3333333 2 4.91 3 0.0095238
## 6 1.0095238 -0.3238095 1.3333333 2 3.72 4 1.0095238
## m_wrkstrscb m_wrkstrscw
## 1 -0.1333333 0.1428571
## 2 -0.1333333 0.1428571
## 3 -0.1333333 -0.8571429
## 4 -0.1333333 0.1428571
## 5 -0.1333333 0.1428571
## 6 -0.1333333 1.1428571
Now that are our data are prepped and formatted correctly, we are now ready for the analysis.
##D. Set up brms formulas The first step is to set up the brms
formulas, with the bf()
function. This will specify the
models that we want to fit. As you can see below, we will set up two
brms formulas: one for female partner and one for male partner.
For each partner, we are modeling relationship dissatisfaction as a function of an intercept term, time (centered), within-person centered work stressors, and between-person centered work stressors as fixed effects. We also model random intercepts and random slopes of time and work stressors for each subject. Note that the latter variables were already centered accordingly when we got the data from the book’s websites.
A difference in syntax compared to most brms models (and
other multilevel packages) is the |p|
term.
|p|
indicates that we want to model correlations of random
effects (the choice of the letter p is arbitrary). To get cross-partner
correlations, the letter inside || should be the same in both formulas.
For our purposes, those cross-partner correlations are potentially very
interesting and important (e.g., do female partners who show stronger
effects of work stressors on relationship dissatisfaction have male
partners who also show stronger effects of work stressors on
relationship dissatisfaction?).
As a finale note, I have modeled the (fixed) intercept with the code
O + intercept
rather than writing in 1
or
nothing at all (the default). This won’t make a difference for our
purposes, but this method of modeling the intercept can be useful in
cases where we want to set common priors for intercepts and slopes. We
are not setting priors here (we use the default priors, which range from
-\(\infty\) to +\(\infty\)), so it won’t make a difference in
terms of the estimates. But just mentioning it here to clarify.
f_model <- bf(f_reldis ~ 0 + intercept + time7c + f_wrkstrscw + f_wrkstrscb + (time7c + f_wrkstrscw |p| coupleid))
m_model <- bf(m_reldis ~ 0 + intercept + time7c + m_wrkstrscw + + m_wrkstrscb + (time7c + m_wrkstrscw |p| coupleid))
##E. Plug in the two bfs into a brm model. Now that we have specified
our brms formulas, we can put them into a model usign the
brm()
function.
By default, multivariate models in brms model correlated residuals, which is useful since this is a dyadic analysis and it is entirely plausible that there will be other variables that explain the relationship between dyad members’ outcomes that we didn’t not explicitly model.
We can also specify an AR(1) error structure, given by the
autocor = cor_ar(~ time7c | coupleid, p = 1)
. The
p = 1
part indicates the lag. For our purposes, this part
is unnecessary to specify, as cor_ar
defaults to AR(1)
already.
As a note, the model sometimes produces warnings about adding more iterations. For a paper, I would probably rerun the analysis and increase the number of iterations. But, this is a demo and the results are extremely similar to those reported in the Bolger & Laurenceau book, so it is not essential for our purposes.
fit <- brm(f_model + m_model, data = ch8,
seed = 111, # set a seed to ensure reproducibility
iter = 8000, # number of iterations
autocor = cor_ar(~ time7c | coupleid, p = 1))
Be advised that this model might take awhile (several hours) to run! You can save your model object so you don’t have to run the model from scratch each time you start a new R session. It will automatically save the model object .rds file to your working directory unless you set another location as part of the filename.
saveRDS(fit, "my_brms_model.rds")
# To load the model object in a fresh R session:
fit <- readRDS("my_brms_model.rds")
Onto the results!
fit
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: f_reldis ~ time7c + f_wrkstrscw + f_wrkstrscb + (time7c + f_wrkstrscw | p | coupleid)
## autocor ~ arma(time = time7c, gr = coupleid, p = 1, q = 0, cov = FALSE)
## m_reldis ~ time7c + m_wrkstrscw + +m_wrkstrscb + (time7c + m_wrkstrscw | p | coupleid)
## autocor ~ arma(time = time7c, gr = coupleid, p = 1, q = 0, cov = FALSE)
## Data: ch8 (Number of observations: 2100)
## Draws: 4 chains, each with iter = 8000; warmup = 4000; thin = 1;
## total post-warmup draws = 16000
##
## Correlation Structures:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## ar_freldis[1] -0.00 0.02 -0.05 0.05 1.00 13673 11705
## ar_mreldis[1] 0.02 0.02 -0.03 0.06 1.00 15772 11841
##
## Group-Level Effects:
## ~coupleid (Number of levels: 100)
## Estimate Est.Error l-95% CI
## sd(freldis_Intercept) 0.98 0.08 0.85
## sd(freldis_time7c) 0.07 0.05 0.00
## sd(freldis_f_wrkstrscw) 0.12 0.04 0.04
## sd(mreldis_Intercept) 1.04 0.08 0.90
## sd(mreldis_time7c) 0.04 0.03 0.00
## sd(mreldis_m_wrkstrscw) 0.17 0.03 0.11
## cor(freldis_Intercept,freldis_time7c) -0.05 0.31 -0.64
## cor(freldis_Intercept,freldis_f_wrkstrscw) 0.39 0.19 0.00
## cor(freldis_time7c,freldis_f_wrkstrscw) -0.18 0.35 -0.77
## cor(freldis_Intercept,mreldis_Intercept) 0.25 0.10 0.05
## cor(freldis_time7c,mreldis_Intercept) 0.13 0.30 -0.49
## cor(freldis_f_wrkstrscw,mreldis_Intercept) -0.00 0.20 -0.40
## cor(freldis_Intercept,mreldis_time7c) -0.04 0.33 -0.68
## cor(freldis_time7c,mreldis_time7c) -0.07 0.38 -0.75
## cor(freldis_f_wrkstrscw,mreldis_time7c) 0.10 0.37 -0.63
## cor(mreldis_Intercept,mreldis_time7c) -0.04 0.33 -0.66
## cor(freldis_Intercept,mreldis_m_wrkstrscw) -0.03 0.15 -0.32
## cor(freldis_time7c,mreldis_m_wrkstrscw) 0.11 0.33 -0.60
## cor(freldis_f_wrkstrscw,mreldis_m_wrkstrscw) 0.30 0.25 -0.24
## cor(mreldis_Intercept,mreldis_m_wrkstrscw) 0.16 0.15 -0.13
## cor(mreldis_time7c,mreldis_m_wrkstrscw) 0.04 0.36 -0.66
## u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(freldis_Intercept) 1.14 1.00 2010 3961
## sd(freldis_time7c) 0.17 1.00 1757 4890
## sd(freldis_f_wrkstrscw) 0.19 1.00 3206 3371
## sd(mreldis_Intercept) 1.21 1.00 4970 8237
## sd(mreldis_time7c) 0.12 1.00 4200 6970
## sd(mreldis_m_wrkstrscw) 0.23 1.00 6375 7027
## cor(freldis_Intercept,freldis_time7c) 0.58 1.00 17964 9147
## cor(freldis_Intercept,freldis_f_wrkstrscw) 0.74 1.00 8650 8469
## cor(freldis_time7c,freldis_f_wrkstrscw) 0.57 1.00 1988 4350
## cor(freldis_Intercept,mreldis_Intercept) 0.44 1.00 4298 7248
## cor(freldis_time7c,mreldis_Intercept) 0.69 1.02 333 545
## cor(freldis_f_wrkstrscw,mreldis_Intercept) 0.39 1.00 934 1985
## cor(freldis_Intercept,mreldis_time7c) 0.61 1.00 20721 11166
## cor(freldis_time7c,mreldis_time7c) 0.68 1.00 9351 11188
## cor(freldis_f_wrkstrscw,mreldis_time7c) 0.75 1.00 13586 12355
## cor(mreldis_Intercept,mreldis_time7c) 0.62 1.00 19812 11199
## cor(freldis_Intercept,mreldis_m_wrkstrscw) 0.28 1.00 12212 11845
## cor(freldis_time7c,mreldis_m_wrkstrscw) 0.70 1.00 1194 1903
## cor(freldis_f_wrkstrscw,mreldis_m_wrkstrscw) 0.74 1.00 2829 3619
## cor(mreldis_Intercept,mreldis_m_wrkstrscw) 0.44 1.00 14675 13864
## cor(mreldis_time7c,mreldis_m_wrkstrscw) 0.70 1.00 2327 6231
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## freldis_Intercept 4.65 0.10 4.44 4.85 1.00 1309 2570
## mreldis_Intercept 5.09 0.11 4.88 5.30 1.00 4016 6597
## freldis_time7c -0.02 0.03 -0.08 0.03 1.00 19847 12168
## freldis_f_wrkstrscw 0.16 0.03 0.11 0.21 1.00 12405 10671
## freldis_f_wrkstrscb 0.58 0.45 -0.34 1.43 1.00 1716 2906
## mreldis_time7c 0.01 0.02 -0.03 0.06 1.00 20004 12199
## mreldis_m_wrkstrscw 0.11 0.03 0.06 0.16 1.00 13575 12142
## mreldis_m_wrkstrscb -0.14 0.45 -1.03 0.74 1.00 4315 7184
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_freldis 1.00 0.02 0.97 1.03 1.00 12065 11320
## sigma_mreldis 0.87 0.01 0.85 0.90 1.00 15533 11944
##
## Residual Correlations:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## rescor(freldis,mreldis) 0.07 0.02 0.03 0.12 1.00 19601
## Tail_ESS
## rescor(freldis,mreldis) 11617
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
As seen from the model output, coefficients with the prefix
freldis_
correspond to effects on the female partner’s
outcome, and those with the prefix mreldis_
correspond to
effects on the male partner’s outcomes. Normally, effects for female and
male partners would be indicated through the use of dummy variables, but
because of brms can estimate multivariate models, with two (or
more), outcomes, we can get these separate effects automatically from
our model set-up.
We also see that we have estimates of model parameters that are noteworthy for dyadic intensive longitudinal models, such as AR(1) errors, separate residual standard deviations, and a residual correlation between dyad members’ outcomes. Of note, other programs, such as SAS, that also allow one to model these parameters typically generate one estimate of autocorrelation that is pooled across partners. So an advantage of brms is that we can get separate AR(1) estimates for each partner.
We also see that, thanks to the |p|
part of our code in
the bf()
objects, we also get correlations among random
effects across partners. For instance,
cor(freldis_f_wrkstrscw,mreldis_m_wrkstrscw)
gives us the
correlation between the female partner’s effect of work stressors on her
relationship dissatisfaction and the male partner’s effect of work
stressors on his relationship dissatisfaction.
#Fit Model with Maximum Likelihood (nlme
) to compare How
do our results compare to what we get using traditional maximum
likelihood estimation?
Here is the model using the stacked data approach, with dummy
variables, in the nlme
package (code drawn from this excellent tutorial from the QuantDev group at Penn
State). We can get separate error terms for each partner and a residual
correlation, but not an AR(1) error structure.
Another difference is that the model did not converge with random slopes for time included, so they have been omitted here.
library(nlme)
fit_nlme <- lme(fixed=reldis ~ male + female + male:time7c + female:time7c +
male:wrkstrscw + female:wrkstrscw +
male:wrkstrscb + female:wrkstrscb + -1,
control=list(maxIter=1000), data=c,
random=~male + female + male:wrkstrscw + female:wrkstrscw + -1| coupleid,
weights=varIdent(form = ~1 | gender),
corr=corCompSymm(form = ~1 | coupleid/time))
summary(fit_nlme)
## Linear mixed-effects model fit by REML
## Data: c
## AIC BIC logLik
## 12096.16 12229.31 -6027.078
##
## Random effects:
## Formula: ~male + female + male:wrkstrscw + female:wrkstrscw + -1 | coupleid
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## male 1.0157904 male female ml:wrk
## female 0.9600378 0.263
## male:wrkstrscw 0.1678950 0.178 -0.019
## female:wrkstrscw 0.1250147 0.007 0.485 0.502
## Residual 0.9981039
##
## Correlation Structure: Compound symmetry
## Formula: ~1 | coupleid/time
## Parameter estimate(s):
## Rho
## 0.07320727
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | gender
## Parameter estimates:
## 1 0
## 1.000000 0.873824
## Fixed effects: reldis ~ male + female + male:time7c + female:time7c + male:wrkstrscw + female:wrkstrscw + male:wrkstrscb + female:wrkstrscb + -1
## Value Std.Error DF t-value p-value
## male 5.085720 0.1038242 4093 48.98397 0.0000
## female 4.647561 0.0989240 4093 46.98114 0.0000
## male:time7c 0.011615 0.0222292 4093 0.52251 0.6013
## female:time7c -0.024921 0.0252926 4093 -0.98532 0.3245
## male:wrkstrscw 0.109398 0.0255077 4093 4.28885 0.0000
## female:wrkstrscw 0.159484 0.0251352 4093 6.34507 0.0000
## male:wrkstrscb -0.141904 0.4388417 4093 -0.32336 0.7464
## female:wrkstrscb 0.623305 0.4276930 4093 1.45737 0.1451
## Correlation:
## male female ml:tm7 fml:t7 ml:wrkstrscw fml:wrkstrscw
## female 0.253
## male:time7c 0.015 0.001
## female:time7c 0.001 0.018 0.071
## male:wrkstrscw 0.115 -0.012 -0.004 -0.001
## female:wrkstrscw 0.003 0.234 0.001 -0.014 0.163
## male:wrkstrscb -0.095 -0.003 0.002 0.001 -0.004 0.001
## female:wrkstrscb 0.003 0.097 0.000 -0.002 0.001 -0.004
## ml:wrkstrscb
## female
## male:time7c
## female:time7c
## male:wrkstrscw
## female:wrkstrscw
## male:wrkstrscb
## female:wrkstrscb -0.029
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.3784814878 -0.6500702762 0.0004838121 0.6501003315 3.3326568390
##
## Number of Observations: 4200
## Number of Groups: 100
How do our results using nlme
compare to those we got
using brms
? We generally see that the fixed effects are
largely the same across the two analyses. The residual standard
deviations and residual correlation are also very similar. As noted
above, random effects for time could not be included in the
nlme
version due to convergence issues, and we don’t have
the AR(1) error structure. We could have specified the AR(1) error
structure but would have had to sacrifice the residual correlation
component in order to do so.
A trade-off is that Bayesian models can take a very long time to run (i.e., several hours, or for very complex models, even a whole day). But in my opinion, the added benefits of the Bayesian analysis make it worth the extra time and effort, and this is the route I would most likely go myself for analyzing dyadic intensive longitudindal data.
View .Rmd source code
updated June 14, 2020