Predictive modeling @ JLU Giessen: regression

Author

Affiliations

Stefan Th. Gries

UC Santa Barbara

JLU Giessen

Published

03 Aug 2025 12-34-56

1 Linear modeling

1.1 Introduction

rm(list=ls(all=TRUE)); invisible(gc())
set.seed(sum(utf8ToInt("Gummibärchen")))
invisible(sapply(c("car", "effects", "emmeans", "magrittr", "multcomp"),
   library, character.only=TRUE, logical.return=TRUE,
   quietly=TRUE, verbose=FALSE, warn.conflicts=TRUE))
invisible(sapply(dir("_helpers", full.names=TRUE), source))

Imagine you have a numeric variable such as the reaction times to words presented on a computer screen; the data we will work with are in _input/rts.csv, you can find information about the variables/columns in _input/rts.r, and our response variable is called RT:

str(d <- read.delim(       # the structure of d, the result of loading
   file="_input/rts.csv",  # this file
   stringsAsFactors=TRUE)) # change categorical variables into factors

'data.frame':   8000 obs. of  9 variables:
 $ CASE    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ RT      : int  748 408 483 1266 636 395 765 1297 849 315 ...
 $ LENGTH  : int  5 5 6 4 6 6 8 4 5 6 ...
 $ LANGUAGE: Factor w/ 2 levels "english","spanish": 2 2 2 2 2 2 2 2 2 2 ...
 $ GROUP   : Factor w/ 2 levels "english","heritage": 2 2 2 2 2 2 2 2 2 2 ...
 $ CORRECT : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...
 $ FREQPMW : int  19 78 233 133 14 67 34 114 2 20 ...
 $ ZIPFFREQ: num  4.28 4.89 5.37 5.12 4.15 ...
 $ SITE    : Factor w/ 3 levels "a","b","c": 2 2 2 2 2 2 3 2 2 1 ...

d$LANGUAGE <- factor(d$LANGUAGE,   # for didactic reasons only, I am
   levels=levels(d$LANGUAGE)[2:1]) # changing the order of the levels
summary(d)

      CASE            RT             LENGTH         LANGUAGE         GROUP     
 Min.   :   1   Min.   : 271.0   Min.   :3.000   spanish:3977   english :3961  
 1st Qu.:2150   1st Qu.: 505.0   1st Qu.:4.000   english:4023   heritage:4039  
 Median :4310   Median : 595.0   Median :5.000                                 
 Mean   :4303   Mean   : 661.5   Mean   :5.198                                 
 3rd Qu.:6450   3rd Qu.: 732.0   3rd Qu.:6.000                                 
 Max.   :8610   Max.   :4130.0   Max.   :9.000                                 
                NA's   :248                                                    
      CORRECT        FREQPMW           ZIPFFREQ     SITE    
 correct  :7749   Min.   :   1.00   Min.   :3.000   a:2403  
 incorrect: 251   1st Qu.:  17.00   1st Qu.:4.230   b:2815  
                  Median :  42.00   Median :4.623   c:2782  
                  Mean   :  81.14   Mean   :4.591           
                  3rd Qu.: 101.00   3rd Qu.:5.004           
                  Max.   :1152.00   Max.   :6.061

There are 7752 reaction times in the data. Your job is to choose one number that, if it is used to predict every single reaction time, results in the best overall prediction for all the data. What would you pick?

possible_picks <- c(300, 600, 700, 900, 1400, 2000)

Let’s see how good each of these is, where we determine goodness on the basis of the sum of the squared differences – why? Because (i) that makes all differences positive and (ii) that penalizes bad predictions more:

(possible_squared_deviations <- setNames(c(
   (d$RT - possible_picks[1])^2 %>% sum(na.rm=TRUE),
   (d$RT - possible_picks[2])^2 %>% sum(na.rm=TRUE),
   (d$RT - possible_picks[3])^2 %>% sum(na.rm=TRUE),
   (d$RT - possible_picks[4])^2 %>% sum(na.rm=TRUE),
   (d$RT - possible_picks[5])^2 %>% sum(na.rm=TRUE),
   (d$RT - possible_picks[6])^2 %>% sum(na.rm=TRUE)),
   possible_picks))
# sapply(possible_picks, \(af) sum((d$RT-af)^2, na.rm=TRUE))

        300         600         700         900        1400        2000 
 1549343984   565761584   547980784   977539184  4764635184 14425470384

Here are the corresponding results for my guess …

(stgs_squared_deviation <- (d$RT - 661.46852425180600221)^2 %>% sum(na.rm=TRUE))

[1] 536471586

… which is better than all other results above:

stgs_squared_deviation < possible_squared_deviations

 300  600  700  900 1400 2000 
TRUE TRUE TRUE TRUE TRUE TRUE

Why, what did I guess? What everyone should have guessed: a measure of central tendency, the statistic you would provide for a variable of interest, a response variable, if you are only allowed to provide a single value. How can we compute this here?

mean(d$RT, na.rm=TRUE)

[1] 661.4685

And then, if we evaluate how well this does along the lines from above, we get the result we already saw:

sum((d$RT - mean(d$RT, na.rm=TRUE))^2, na.rm=TRUE)

[1] 536471586

A different way to have R answer that question is with a linear model. But since right now we’re only looking at the response variable RT right now – we have no predictor(s)! – this model is what is called the null model:

summary(m_00 <- lm(       # make/summarize the linear model m_00:
   RT ~ 1,                # RT ~ an overall intercept (1) & no predictor
   data=d,                # those variables are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT ~ 1, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max 
-390.5 -156.5  -66.5   70.5 3468.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  661.469      2.988   221.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 263.1 on 7751 degrees of freedom
  (248 observations deleted due to missingness)

This is the same as the sum of the squared residuals – what is that sum?

sum((d$RT - predict(m_00))^2, na.rm=TRUE)

[1] 536471586

sum(residuals(m_00)^2, na.rm=TRUE)

[1] 536471586

Another way to get this more easily is this:

deviance(m_00) # the 'badness of fit-'

[1] 536471586

How/where does this null model guess/predict the overall mean?

coef(m_00)

(Intercept) 
   661.4685

What do the other columns mean? The null model is the same as if you use a one-sample t-test to test whether the mean of reaction times is 0:

# the standard error
sd(d$RT, na.rm=TRUE) / sqrt(sum(!is.na(d$RT)))

[1] 2.988048

# the t- and the p-value
t.test(d$RT)


    One Sample t-test

data:  d$RT
t = 221.37, df = 7751, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 655.6111 667.3259
sample estimates:
mean of x 
 661.4685

Since our above deviance value is the overall amount of variability of the response variable and, at the same time, an expression of how bad a model with no predictors is at making predictions, we are now of course hoping that our predictors can reduce that number …

1.2 A binary predictor

1.2.1 Modeling & numerical interpretation

Let us now apply this and linear modeling to a scenario where we have a binary predictor. Imagine we’re asking, does the reaction time to a word (in ms) vary as a function of which language that word was presented in (Spanish vs. English)?

Given what we said so far, what is going to be the best strategy to make the best guesses, i.e. reduce your deviance? You should guess the mean for each level of the response:

mean(d$RT[d$LANGUAGE=="spanish"], na.rm=TRUE)       # 'regular' R

[1] 731.5528

d$RT %>% "["(d$LANGUAGE=="english") %>% na.omit %>% mean # piping

[1] 594.9439

How do we do this better?

tapply(              # apply to
   d$RT,             # this variable
   d$LANGUAGE,       # a grouping by this one
   mean, na.rm=TRUE) # return means for each group

 spanish  english 
731.5528 594.9439

Note, there are two ways to represent this:

we get a mean for each group (like above);
we get one mean (e.g., the one for LANGUAGE: spanish) and the info how, from that mean, we can compute the other (i.e. spanish→english).

What R does by default is option 2:

summary(m_01 <- lm(       # make/summarize the linear model m_01:
   RT ~ 1 +               # RT ~ an overall intercept (1)
   LANGUAGE,              # & the predictor LANGUAGE
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT ~ 1 + LANGUAGE, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max 
-428.6 -146.6  -58.9   69.1 3398.4 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      731.553      4.135  176.90   <2e-16 ***
LANGUAGEenglish -136.609      5.774  -23.66   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 254.1 on 7750 degrees of freedom
  (248 observations deleted due to missingness)
Multiple R-squared:  0.06737,   Adjusted R-squared:  0.06725 
F-statistic: 559.8 on 1 and 7750 DF,  p-value: < 2.2e-16

First, is the predictor LANGUAGE significant? This is easy here because

the model has just this one predictor so the p-value of the whole model is the p-value of its only predictor;
the predictor has 1 df (because it is binary), which, means that the p-value in the coefficients table for LANGUAGEenglish is also the p-value of the predictor.

This predictor is actually like using a binary logical (or numeric 0-1 coded) vector as a predictor that answers the question ‘is LANGUAGE english?’:

d$is_LANGUAGE_english <- d$LANGUAGE=="english"
summary(lm(               # make/summarize the linear model:
   RT ~ 1 +               # RT ~ an overall intercept (1)
   is_LANGUAGE_english,   # & the predictor LANGUAGE
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT ~ 1 + is_LANGUAGE_english, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max 
-428.6 -146.6  -58.9   69.1 3398.4 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              731.553      4.135  176.90   <2e-16 ***
is_LANGUAGE_englishTRUE -136.609      5.774  -23.66   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 254.1 on 7750 degrees of freedom
  (248 observations deleted due to missingness)
Multiple R-squared:  0.06737,   Adjusted R-squared:  0.06725 
F-statistic: 559.8 on 1 and 7750 DF,  p-value: < 2.2e-16

We can see this in particular if we consider two simple but important functions: model.frame and model.matrix. The model frame is all the data (rows and columns) used in the model:

model.frame(m_01) %>% head

    RT LANGUAGE
1  748  spanish
2  408  spanish
3  483  spanish
4 1266  spanish
5  636  spanish
6  395  spanish

model.frame(m_01) %>% dim # i.e., w/out the missing data!

[1] 7752    2

The model matrix is how the model internally represents the data and from what you compute predictions:

d[1:3, c("RT", "LANGUAGE")]

   RT LANGUAGE
1 748  spanish
2 408  spanish
3 483  spanish

model.matrix(m_01) %>% head(3)

  (Intercept) LANGUAGEenglish
1           1               0
2           1               0
3           1               0

d[7998:8000, c("RT", "LANGUAGE")]

      RT LANGUAGE
7998 947  english
7999 660  english
8000 437  english

model.matrix(m_01) %>% tail(3)

     (Intercept) LANGUAGEenglish
7998           1               1
7999           1               1
8000           1               1

In model.matrix(m_01), the 2nd column is called LANGUAGEenglish and it is what we computed above as d$is_LANGUAGE_english <- d$LANGUAGE=="english".

So what do the coefficients mean or represent?

coef(m_01)

    (Intercept) LANGUAGEenglish 
       731.5528       -136.6089

The coefficients are used like this: For each case x, you

multiply the coefficients with row x from the model matrix;
sum up the result,

and that is the prediction for that case:

# prediction for the first case:
"*"(
   coef(m_01),
   model.matrix(m_01)[1,]) %>% sum

[1] 731.5528

# prediction for the last case:
"*"(
   coef(m_01),
   model.matrix(m_01)[7752,]) %>% sum

[1] 594.9439

That also means

the intercept is the prediction when the predictor is its first level (spanish);
the 2nd coefficient is how you get from the intercept to the prediction for when the predictor is its 2nd level (english), and that difference is significant.

But there is an easier way to get the predictions – we use predict – and it’s usually a good idea to add them to the data frame:

d$PRED <- predict(m_01)
head(d,3)

  CASE  RT LENGTH LANGUAGE    GROUP CORRECT FREQPMW ZIPFFREQ SITE
1    1 748      5  spanish heritage correct      19 4.278754    b
2    2 408      5  spanish heritage correct      78 4.892095    b
3    3 483      6  spanish heritage correct     233 5.367356    b
  is_LANGUAGE_english     PRED
1               FALSE 731.5528
2               FALSE 731.5528
3               FALSE 731.5528

tail(d,3)

     CASE  RT LENGTH LANGUAGE   GROUP CORRECT FREQPMW ZIPFFREQ SITE
7998 8608 947      5  english english correct      80 4.903090    a
7999 8609 660      4  english english correct     746 5.872739    a
8000 8610 437      5  english english correct      54 4.732394    c
     is_LANGUAGE_english     PRED
7998                TRUE 594.9439
7999                TRUE 594.9439
8000                TRUE 594.9439

How good is this model? For that we need the deviance again:

sum((d$RT - d$PRED)^2, na.rm=TRUE)

[1] 500329194

deviance(m_01)

[1] 500329194

How much is this an improvement in % of the original deviance?

"-"(
   deviance(m_00),
   deviance(m_01)) / deviance(m_00)

[1] 0.06737056

What is that? R-squared!

Let’s compute the confidence intervals of our coefficients real quick – how?

Confint(m_01) # from the package car

                 Estimate     2.5 %    97.5 %
(Intercept)      731.5528  723.4463  739.6594
LANGUAGEenglish -136.6089 -147.9268 -125.2911

Here, interpreting the coefficients and making the predictions is very easy, but here is a useful way to do this especially for when models become more complex: We compute the effects predictions of the model:

(lan_d <- data.frame( # make lan_d a data frame of
   lan <- effect(     # the effect
      "LANGUAGE",     # of LANGUAGE
      m_01)))         # in m_01

  LANGUAGE      fit       se    lower    upper
1  spanish 731.5528 4.135410 723.4463 739.6594
2  english 594.9439 4.029019 587.0460 602.8419

# or
(lan_d <- data.frame( # make lan_d a data frame of
   lan <- predictorEffect( # the effect
      "LANGUAGE",          # of LANGUAGE
      m_01)))              # in m_01

  LANGUAGE      fit       se    lower    upper
1  spanish 731.5528 4.135410 723.4463 739.6594
2  english 594.9439 4.029019 587.0460 602.8419

1.2.2 Visual interpretation

Let’s visualize the nature of the effect of LANGUAGE based on the predictions from effects:

plot(lan,                        # plot the effect of LANGUAGE in m_01
   xlab="Language",              # w/ this x-axis label
   ylab="RT in ms",              # w/ this y-axis label
   ylim=range(d$RT, na.rm=TRUE), # w/ these y-axis limits
   grid=TRUE)                    # & w/ a grid
plot(lan,                        # plot the effect of LANGUAGE
   xlab="Language",              # w/ this x-axis label
   ylab="RT in ms (middle 90%)", # w/ this y-axis label
   ylim=c(417, 1137.45),         # w/ these y-axis limits (middle 90%)
   grid=TRUE)                    # & w/ a grid

Figure 1: The effect of LANGUAGE in m_01

Figure 2: The effect of LANGUAGE in m_01

And here a version with everything: observed values, predicted values, and the confidence intervals of the predictions:

Figure 3: The effect of LANGUAGE in m_01

1.2.3 Write-up

To determine whether the reaction time to a word (in ms) varies as a function of the language the word was presented in (english vs. spanish), a linear model was fit with RT as the response variable and LANGUAGE as a predictor. The model was highly significant (F=559.8, df₁=1, df₂=7750, p<0.0001) but explains very little of the response variable (mult. R²=0.067, adj. R²=0.067). The coefficients indicate that reaction times are longer for Spanish words, as shown in the summary table and the figure shown here. [Show a plot]

	Estimate	95%-CI	se	t	p_2-tailed
Intercept (LANGUAGE=spanish)	731.55	[723.45, 739.66]	4.14	176.9	<0.001
LANGUAGE_{spanish→english}	-136.61	[125.29, 147.93]	5.77	-23.66	<0.001

Some housekeeping:

str(d <- d[,1:9])

'data.frame':   8000 obs. of  9 variables:
 $ CASE    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ RT      : int  748 408 483 1266 636 395 765 1297 849 315 ...
 $ LENGTH  : int  5 5 6 4 6 6 8 4 5 6 ...
 $ LANGUAGE: Factor w/ 2 levels "spanish","english": 1 1 1 1 1 1 1 1 1 1 ...
 $ GROUP   : Factor w/ 2 levels "english","heritage": 2 2 2 2 2 2 2 2 2 2 ...
 $ CORRECT : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...
 $ FREQPMW : int  19 78 233 133 14 67 34 114 2 20 ...
 $ ZIPFFREQ: num  4.28 4.89 5.37 5.12 4.15 ...
 $ SITE    : Factor w/ 3 levels "a","b","c": 2 2 2 2 2 2 3 2 2 1 ...

Now, why is this model actually crap? That’s the topic of the next section.

1.2.4 Diagnostics 1

Just like most statistical tests, linear models require certain assumptions to be met in order for their results to be trustworthy. The first assumption is concerned with the number of data points we need. For linear models like ours, one rule of thumb (!) is that the number of data points in the model (!) must be greater than the product of the number of values estimated in the model and 15 or, better even, 20.

How many data points do we have? It’s not nrow(d)! – it’s this:

nrow(model.frame(m_01))

[1] 7752

How many values are estimated in the model? The best way to get that number is this:

m_01 %>% logLik %>% attr("df")

[1] 3

So I think we’re just about ok …

c("we have enough data points"=">"(
   nrow(model.frame(m_01)),
   20 * (m_01 %>% logLik %>% attr("df"))))

we have enough data points 
                      TRUE

Note: one should minimally also check that one has enough data points for all (combinations of) levels of predictors and their interactions!

Another important one set of assumptions is that the residuals are

approximately normally distributed;
homoscedastic (i.e., here, exhibit a fairly even spread across the range of predicted values).

Let’s look at this here:

par(mfrow=c(2, 2)) # make the plotting window have 2 rows & 2 columns
hist(d$RT, breaks="FD") # a histogram of the response
hist(residuals(m_01), main="Residuals") # a histogram of the residuals
plot(m_01, which=1:2)   # plot model-diagnostic plots 1 & 2
par(mfrow=c(1, 1)) # reset to default

Is this good or bad? It’s bad … How might one fix this? With a transformation, for example:

par(mfrow=c(1, 2)) # make the plotting window have 1 row & 2 columns
hist(log2(d$RT))              # a histogram of ...
hist(log2(d$RT), breaks="FD") # the logged response
par(mfrow=c(1, 1)) # reset to default

Much better, we will work with this from now on:

d$RT_log <- log2(d$RT)

How does the new response look like with the predictor?

# the predictor(s) w/ the response
boxplot(main="RT in ms (logged) ~ stimulus language", varwidth=TRUE,
   d$RT_log ~ d$LANGUAGE, notch=TRUE,
   xlab="Stimulus language",
   ylab="RT in ms (logged)", ylim=c(8, 12)); grid()

Figure 6: RT (logged) as a function of LANGUAGE

Does that improve the diagnostics?

m_02 <- lm(              # make the linear model m_02:
   RT_log ~ 1 +          # RT_log ~ an overall intercept (1)
   LANGUAGE,             # & the predictor LANGUAGE
   data=d,               # those vars are in d
   na.action=na.exclude) # skip cases with NA/missing data (good habit)
par(mfrow=c(2, 2)) # make the plotting window have 2 rows & 2 columns
hist(d$RT_log, breaks="FD") # a histogram of the response
hist(residuals(m_02), main="Residuals") # a histogram of the residuals
plot(m_02, which=1:2)   # plot model-diagnostic plots 1 & 2
par(mfrow=c(1, 1)) # reset to default

Yes, so we’ll use this response variable from now on. Also, we will reduce the data set to the complete cases:

# delete NAs & reorder columns
str(d <- d[complete.cases(d), c(1,10,5,9,3,8,4,2,7,6)])

'data.frame':   7752 obs. of  10 variables:
 $ CASE    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ RT_log  : num  9.55 8.67 8.92 10.31 9.31 ...
 $ GROUP   : Factor w/ 2 levels "english","heritage": 2 2 2 2 2 2 2 2 2 2 ...
 $ SITE    : Factor w/ 3 levels "a","b","c": 2 2 2 2 2 2 3 2 2 1 ...
 $ LENGTH  : int  5 5 6 4 6 6 8 4 5 6 ...
 $ ZIPFFREQ: num  4.28 4.89 5.37 5.12 4.15 ...
 $ LANGUAGE: Factor w/ 2 levels "spanish","english": 1 1 1 1 1 1 1 1 1 1 ...
 $ RT      : int  748 408 483 1266 636 395 765 1297 849 315 ...
 $ FREQPMW : int  19 78 233 133 14 67 34 114 2 20 ...
 $ CORRECT : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...

Anything else we need to do? We need a new null model because our response has changed:

m_00 <- lm(RT_log ~ 1, data=d, na.action=na.exclude)

1.3 A categorical predictor

Let us now extend this to a scenario where we have a categorical predictor. Imagine we’re asking, does the (now logged) reaction time to a word (in ms) vary as a function of in which of three experimental sites the data were collected (a vs. b vs. c)?

This time we’re smarter than before and begin with some exploration:

# the predictor(s)/response on its/their own
table(d$SITE)


   a    b    c 
2403 2815 2534

par(mfrow=c(1,2))
hist(d$RT_log, breaks="FD")
boxplot(main="RT in ms (logged) ~  experimental site", varwidth=TRUE,
   d$RT_log ~ d$SITE, notch=TRUE,
   xlab="Experimental site",
   ylab="RT in ms", ylim=c(8, 12)); grid()
par(mfrow=c(1,1))

Figure 8: RT (logged) as a function of SITE

It is noteworthy that site c differs considerably from the others in the outlier distribution; this would definitely require some attention!

1.3.1 Modeling & numerical interpretation

Given what we said so far, what is going to be the best strategy to make the best guesses, i.e. reduce your deviance? You should guess the mean for each level of the response:

tapply(              # apply to
   d$RT_log,         # this variable
   d$SITE,           # a grouping by this one
   mean, na.rm=TRUE) # return means for each group

       a        b        c 
9.191055 9.311756 9.357216

Note, there are three ways to represent this:

we get a mean for each group (like above);
we get one mean (e.g., the one for SITE: a) and the info how, from that mean, we can compute each of the other two (i.e. a→b and a→c);
we get one mean (e.g., the one for SITE: a) and the info how we can compute each of the other two successively (i.e. a→b and b→c).

What R does by default is option 2:

summary(m_01 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   SITE,                  # & the predictor SITE
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + SITE, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.10891 -0.30613 -0.08053  0.23524  2.65471 

Coefficients:
            Estimate Std. Error  t value Pr(>|t|)    
(Intercept) 9.191055   0.009111 1008.810   <2e-16 ***
SITEb       0.120700   0.012404    9.731   <2e-16 ***
SITEc       0.166161   0.012717   13.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4466 on 7749 degrees of freedom
Multiple R-squared:  0.02295,   Adjusted R-squared:  0.0227 
F-statistic:    91 on 2 and 7749 DF,  p-value: < 2.2e-16

Confint(m_01)

             Estimate      2.5 %    97.5 %
(Intercept) 9.1910552 9.17319560 9.2089148
SITEb       0.1207003 0.09638472 0.1450159
SITEc       0.1661612 0.14123245 0.1910899

We can see this in particular of we return to model.matrix:

picky <- c(10, 1, 7)
d[picky, c("RT_log", "SITE")]

     RT_log SITE
10 8.299208    a
1  9.546894    b
7  9.579316    c

model.matrix(m_01) %>% "["(picky,)

   (Intercept) SITEb SITEc
10           1     0     0
1            1     1     0
7            1     0     1

In model.matrix(m_01),

the 2nd column is what we would compute as d$SITE=="b";
the 3rd column is what we would compute as d$SITE=="c".

But again, is the predictor SITE significant? This is easy here because the model has just this one predictor so the p-value of the whole model is the p-value of its only predictor. However, the predictor has >1 df (because it is categorical with >2 levels), which, means that the p-values in the coefficients table does not tell us what the p-value for ‘all of SITE’ is. So, then, what do the coefficients mean or represent?

coef(m_01)

(Intercept)       SITEb       SITEc 
  9.1910552   0.1207003   0.1661612

The coefficients are used like they we just said above: For each case x, you

multiply the coefficients with row x from the model matrix;
sum up the result,

and that is the prediction for that case:

"*"( # prediction for case 10:
   coef(m_01),
   model.matrix(m_01)[10,]) %>% sum

[1] 9.191055

"*"( # prediction for case 1:
   coef(m_01),
   model.matrix(m_01)[ 1,]) %>% sum

[1] 9.311756

"*"( # prediction for case 7:
   coef(m_01),
   model.matrix(m_01)[ 7,]) %>% sum

[1] 9.357216

In other words,

the intercept is the prediction when the predictor is its first level (a);
the 2nd coefficient is how you get from the intercept to the prediction when the predictor is its 2nd level (b), and that difference is significant;
the 3rd coefficient is how you get from the intercept to the prediction for when the predictor is its 2nd level (c), and that difference is significant;

Thus, this is what this model predicts and we again add it to the data frame d:

d$PRED <- predict(m_01)
d[picky,]

   CASE   RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE  RT FREQPMW CORRECT
10   10 8.299208 heritage    a      6 4.301030  spanish 315      20 correct
1     1 9.546894 heritage    b      5 4.278754  spanish 748      19 correct
7     7 9.579316 heritage    c      8 4.531479  spanish 765      34 correct
       PRED
10 9.191055
1  9.311756
7  9.357216

Is this an acceptable model (in terms of diagnostics)? It’s not that bad …

par(mfrow=c(2, 2)) # make the plotting window have 2 rows & 2 columns
hist(d$RT_log, breaks="FD") # a histogram of the response
hist(residuals(m_01), main="Residuals") # a histogram of the residuals
plot(m_01, which=1:2)   # plot model-diagnostic plot 1
par(mfrow=c(1, 1)) # reset to default

How good is this model? For that we need the deviance again:

sum((d$RT - d$PRED)^2, na.rm=TRUE)

[1] 3833645052

deviance(m_01)

[1] 1545.65

How much is this an improvement in % of the original deviance?

"-"(
   deviance(m_00),
   deviance(m_01)) / deviance(m_00)

[1] 0.02294739

What is that again? R-squared.

Again, interpreting the coefficients and making the predictions is very easy, but we again use the effects package for getting all relevant predictions for our predictor:

(sit_d <- data.frame( # make sit_d a data frame of
   sit <- effect(     # the effect
      "SITE",         # of SITE
      m_01)))         # in m_01

  SITE      fit          se    lower    upper
1    a 9.191055 0.009110786 9.173196 9.208915
2    b 9.311756 0.008417702 9.295255 9.328256
3    c 9.357216 0.008872161 9.339825 9.374608

(sit_d <- data.frame(      # make sit_d a data frame of
   sit <- predictorEffect( # the effect
      "SITE",              # of SITE
      m_01)))              # in m_01

  SITE      fit          se    lower    upper
1    a 9.191055 0.009110786 9.173196 9.208915
2    b 9.311756 0.008417702 9.295255 9.328256
3    c 9.357216 0.008872161 9.339825 9.374608

1.3.2 Visual interpretation

Let’s visualize the nature of the effect of SITE based on the predictions from effects:

plot(sit,           # plot the effect of SITE in m_01
   xlab="Site",     # w/ this x-axis label
   ylab="RT in ms", # w/ this y-axis label
   ylim=range(d$RT_log, na.rm=TRUE), # w/ these y-axis limits
   grid=TRUE)       # & w/ a grid

And here a version with everything: observed values, predicted values, and the confidence intervals of the predictions:

Excursus: This is interesting because only the very last plot shows something that we haven’t noticed before, namely the fact that both sites a and c have a surprising gap of RT values around 600 ms (specifically, no values in the interval [590, 602]); we can see this much more clearly in what is one of my favorite exploratory plots anyway, an ecdf plot:

Figure 12: The ecdf of RT (logged) by SITE

This example allows us to pursue a variety of important notions and follow-up questions. Look at the model summary again:

m_01 %>% summary


Call:
lm(formula = RT_log ~ 1 + SITE, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.10891 -0.30613 -0.08053  0.23524  2.65471 

Coefficients:
            Estimate Std. Error  t value Pr(>|t|)    
(Intercept) 9.191055   0.009111 1008.810   <2e-16 ***
SITEb       0.120700   0.012404    9.731   <2e-16 ***
SITEc       0.166161   0.012717   13.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4466 on 7749 degrees of freedom
Multiple R-squared:  0.02295,   Adjusted R-squared:  0.0227 
F-statistic:    91 on 2 and 7749 DF,  p-value: < 2.2e-16

Here are some questions:

what is the difference between the reaction times that the model predicts for SITE: b and for SITE: c?
is that difference significant – how do we find that out?
what might we need to pay attention to when we look at that?

1.3.3 Excursus 1: the difference

What is the difference between the reaction times that the model predicts for SITE: b and for SITE: c?

"-"( # difference between the two predictions:
   coef(m_01)[1]+coef(m_01)[2],
   coef(m_01)[1]+coef(m_01)[3]) %>% unname # aka

[1] -0.04546087

# difference between the two coefficients:
0.120700 - 0.166161

[1] -0.045461

unname(coef(m_01)[2]-coef(m_01)[3])        # aka

[1] -0.04546087

1.3.4 Excursus 2: is it significant?

1.3.4.1 Conflation via model comparison

In cases where one has multiple levels of a categorical predictor, sometimes the question arises as to whether (all) levels of that predictor are in fact significantly different from each other. In this particular case, the effects plot for SITE very clearly suggests that they are (because the confidence intervals of all predicted means do not overlap), but let’s discuss briefly how this can be tested with a method called model comparison: If you have two models that were fit to the same data set and where one model is a sub-model of the other model – typically such that they differ only by one predictor/term – then you can use the function anova to compare the two models to see whether the more complex model can justify its higher degree of complexity by being significantly better than the less complex model. In this case, 2 of the 3 possible comparisons of SITE are already in the summary table:

summary(m_01)$coefficients

             Estimate  Std. Error     t value     Pr(>|t|)
(Intercept) 9.1910552 0.009110786 1008.810364 0.000000e+00
SITEb       0.1207003 0.012404198    9.730601 2.992632e-22
SITEc       0.1661612 0.012716983   13.066084 1.313862e-38

From this, we could already compute the predictions for all three sites and we already know that SITE: a is significantly different from SITE: b and SITE: c, but we don’t know whether the 0.04546087 difference between SITE: b and SITE: c is significant as well. To find that out, we can compare m_01 to a model in which SITE: b and SITE: c are not distinguished, i.e. a model that is missing exactly the difference we’re interested in. For that, we first, we create a version of SITE that conflates levels as desired:

d$SITE_confl <- d$SITE # create a copy of SITE
levels(d$SITE_confl) <- # overwrite the copy's levels such that
   c("a",               # "a" remains "a"
     "b/c", "b/c")      # "b" & "c" become "b/c"
table(SITE=d$SITE, SITE_confl=d$SITE_confl) # check you did it right

    SITE_confl
SITE    a  b/c
   a 2403    0
   b    0 2815
   c    0 2534

head(d)

  CASE    RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE   RT FREQPMW CORRECT
1    1  9.546894 heritage    b      5 4.278754  spanish  748      19 correct
2    2  8.672425 heritage    b      5 4.892095  spanish  408      78 correct
3    3  8.915879 heritage    b      6 5.367356  spanish  483     233 correct
4    4 10.306062 heritage    b      4 5.123852  spanish 1266     133 correct
5    5  9.312883 heritage    b      6 4.146128  spanish  636      14 correct
6    6  8.625709 heritage    b      6 4.826075  spanish  395      67 correct
      PRED SITE_confl
1 9.311756        b/c
2 9.311756        b/c
3 9.311756        b/c
4 9.311756        b/c
5 9.311756        b/c
6 9.311756        b/c

We already have the model with SITE as the only predictor so we only need to fit a new model with SITE_confl as the only predictor:

summary(m_01_siteconfl <- lm( # make/summarize the linear model m_01_siteconfl:
   RT_log ~ 1 + SITE_confl,   # RT_log ~ an overall intercept (1) & SITE_confl
   data=d, na.action=na.exclude)) # the other arguments


Call:
lm(formula = RT_log ~ 1 + SITE_confl, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.10891 -0.30841 -0.08063  0.23555  2.67863 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.191055   0.009118 1007.98   <2e-16 ***
SITE_conflb/c 0.142237   0.010977   12.96   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.447 on 7750 degrees of freedom
Multiple R-squared:  0.02121,   Adjusted R-squared:  0.02108 
F-statistic: 167.9 on 1 and 7750 DF,  p-value: < 2.2e-16

… and then we do a comparison with anova:

anova(m_01,        # compare the model w/ the 3-level predictor SITE
   m_01_siteconfl, # & the model w/ the 2-level predictor SITE_confl
   test="F") %>% data.frame # using an F-test

  Res.Df      RSS Df Sum.of.Sq        F       Pr..F.
1   7749 1545.650 NA        NA       NA           NA
2   7750 1548.406 -1 -2.756055 13.81728 0.0002028949

The difference between the two models is highly significant: The finer resolution of SITE compared to SITE_confl makes the model significantly better (p<0.0003). Thus, we now know not only the size of the 3rd comparison (SITE: b vs. c, which was computable from summary(m_01)), but also its p-value.

1.3.4.2 Post-hoc comparisons

A for now final way of getting such results is available from emmeans::emmeans:

summary(m_01)$coefficients

             Estimate  Std. Error     t value     Pr(>|t|)
(Intercept) 9.1910552 0.009110786 1008.810364 0.000000e+00
SITEb       0.1207003 0.012404198    9.730601 2.992632e-22
SITEc       0.1661612 0.012716983   13.066084 1.313862e-38

# the following is pretty much the same as sit_d
data.frame(qwe <- emmeans(   # compute estimated marginal means
   object=m_01,    # for the model m_01
   specs= ~ SITE)) # namely all contrasts of this predictor

  SITE   emmean          SE   df lower.CL upper.CL
1    a 9.191055 0.009110786 7749 9.173196 9.208915
2    b 9.311756 0.008417702 7749 9.295255 9.328256
3    c 9.357216 0.008872161 7749 9.339825 9.374608

From this object qwe with all its means, we can compute all pairwise comparisons:

pairs(qwe,            # pairwise comparisons of means
   adjust="none") %>% # careful w/ this!
   data.frame         # show as a data frame

  contrast    estimate         SE   df    t.ratio      p.value
1    a - b -0.12070030 0.01240420 7749  -9.730601 2.992632e-22
2    a - c -0.16616117 0.01271698 7749 -13.066084 1.313862e-38
3    b - c -0.04546087 0.01223000 7749  -3.717160 2.028949e-04

In our case,

lines 1 and 2 of the pairs(emmeans, ...) output correspond to lines 2 and 3 of the model’s summary table;
line 3 of the pairs(emmeans, ...) output corresponds to the result we obtained with either the model conflation or the sdif approach above.

Now, why does it it say adjust="none") # careful w/ this! above ? Because of how larger number of levels can lead to many many pairwise comparisons:

(numbers_of_levels_of_cat_predictors <- 3:10)

[1]  3  4  5  6  7  8  9 10

(how_many_in_sum <- numbers_of_levels_of_cat_predictors -1)

[1] 2 3 4 5 6 7 8 9

(possible_post_tests <- "*"(
   numbers_of_levels_of_cat_predictors,
   numbers_of_levels_of_cat_predictors-1) / 2)

[1]  3  6 10 15 21 28 36 45

par(mfrow=c(1,2))
plot(pch=16, type="b",
   x=numbers_of_levels_of_cat_predictors,
   y=possible_post_tests, ylim=c(0, 50)); grid()
plot(ylim=c(0,1), pch=16, type="b",
   x=numbers_of_levels_of_cat_predictors,
   y=how_many_in_sum/possible_post_tests); grid()
par(mfrow=c(1,1))

Figure 13: The relation of levels to possible tests

1.3.4.3 Multiple post-hoc tests

But what does that mean for our p-values? The work worse and worse …

If we do 1 significance test with a significance level of 0.05, what is the chance we make the right decision when we adopt H₁?

1-0.05

[1] 0.95

If we do 2 significance tests with a significance level of 0.05, what is the chance we always make the right decision when we adopt H₁?

0.95*0.95

[1] 0.9025

If we do 3 significance tests with a significance level of 0.05, what is the chance we always make the right decision when we adopt H₁?

0.95^3

[1] 0.857375

If we do 13 significance tests with a significance level of 0.05, what is the chance we always make the right decision when we adopt H₁?

0.95^13

[1] 0.5133421

Thus, if we test the living hell out of a predictor by testing all possible comparisons, we need to protect ourselves against this – how?

# Bonferroni shortcut for 3 tests:
0.05/3 # the new significance level

[1] 0.01666667

# Duncan for 3 tests:
1-0.95^(1/3)

[1] 0.01695243

# Bonferroni shortcut for 8 tests:
0.05/8 # the new significance level

[1] 0.00625

# Duncan for 8 tests:
1-0.95^(1/8)

[1] 0.006391151

But many would say these are too conservative. My advice, use p.adjust:

some_pvalues <- c(0.048, 0.043, 0.039, 0.011, 0.009)
list(
   "Bonferroni versions"=some_pvalues * length(some_pvalues),
   "Holm's method"=p.adjust(some_pvalues, method="holm"))

$`Bonferroni versions`
[1] 0.240 0.215 0.195 0.055 0.045

$`Holm's method`
[1] 0.117 0.117 0.117 0.045 0.045

1.3.5 Write-up

To determine whether the reaction time to a word (in ms) varies as a function of the site where the data were collected (a vs. b vs. c), a linear model was fit with RT as the response variable and SITE as a predictor. The model was highly significant (F=112, df₁=2, df₂=7749, p<0.0001) but explains very little of the response variable (mult. R²=0.028, adj. R²=0.028). The coefficients indicate that reaction times are longer for Spanish words, as shown in the summary table and the figure shown here. [Show a plot] The model is potentially somewhat problematic because its residuals are not normally distributed and come with a very long right tail (esp. for SITE: c).

	Estimate	95%-CI	se	t	p_2-tailed
Intercept (SITE=a)	9.191	[9.173, 9.209]	0.009	1008.81	<0.001
SITE_a→b	0.121	[0.096, 0.145]	0.12	9.73	<0.001
SITE_a→c	0.166	[0.144, 0.191]	0.13	13.07	<0.001

A post-hoc comparison of the above model with one in which sites b and c were conflated indicates that sites b and c are also significantly different from each other (difference between means: 0.04546, F=13.817/t=3.717, p<0.0003).

Some housekeeping:

str(d <- d[,1:10])

'data.frame':   7752 obs. of  10 variables:
 $ CASE    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ RT_log  : num  9.55 8.67 8.92 10.31 9.31 ...
 $ GROUP   : Factor w/ 2 levels "english","heritage": 2 2 2 2 2 2 2 2 2 2 ...
 $ SITE    : Factor w/ 3 levels "a","b","c": 2 2 2 2 2 2 3 2 2 1 ...
 $ LENGTH  : int  5 5 6 4 6 6 8 4 5 6 ...
 $ ZIPFFREQ: num  4.28 4.89 5.37 5.12 4.15 ...
 $ LANGUAGE: Factor w/ 2 levels "spanish","english": 1 1 1 1 1 1 1 1 1 1 ...
 $ RT      : int  748 408 483 1266 636 395 765 1297 849 315 ...
 $ FREQPMW : int  19 78 233 133 14 67 34 114 2 20 ...
 $ CORRECT : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...

1.4 A numeric predictor

Let us now extend this to a scenario where we have a numeric predictor. Imagine we’re asking, does the reaction time to a word (in ms) vary as a function of the reaction time to a word (in ms) vary as a function of the frequency of that word (normalized to per million words)?

This time we’re smarter than before and begin with exploration:

# hist(d$RT_log); hist(d$RT_log, breaks="FD")
par(mfrow=c(1,2))
hist(d$FREQPMW); hist(d$FREQPMW, breaks="FD")
par(mfrow=c(1,1))

Thoughts? That looks terrible again …

Let’s also look at the pairwise relationship:

#| fig-show: hold
# the predictor(s) w/ the response
plot(pch=16, col="#00000020",
   x=d$FREQPMW,
   y=d$RT_log)
abline(v=quantiles(d$FREQPMW), lty=3, col="#00000040")
abline(h=quantiles(d$RT_log) , lty=3, col="#00000040")

Figure 15: RT (logged) as a function of FREQPMW

Thoughts? Obviously, we can’t work with this – let’s use ZIPFFREQ instead:

par(mfrow=c(1,3))
# the predictor(s)/response on its/their own
hist(d$ZIPFFREQ); hist(d$ZIPFFREQ, breaks="FD")
# the predictor(s) w/ the response
plot(pch=16, col="#00000020",
   x=d$ZIPFFREQ,
   y=d$RT_log)
abline(v=quantiles(d$ZIPFFREQ), lty=3, col="#00000040")
abline(h=quantiles(d$RT_log)  , lty=3, col="#00000040")
par(mfrow=c(1,1))

Figure 16: RT (logged) as a function of ZIPFFREQ

1.4.1 Modeling & numerical interpretation

How would we have dealt with this without a linear model? We would have computed a Pearson product-moment correlation coefficient r (or Spearman’s ρ, either with cor.test). Thus, we might have done something like this (for Pearson’s r) …

cor.test(d$RT_log, d$ZIPFFREQ)


    Pearson's product-moment correlation

data:  d$RT_log and d$ZIPFFREQ
t = -12.939, df = 7750, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1671408 -0.1235589
sample estimates:
       cor 
-0.1454204

But we want to fit a linear model so here goes:

summary(m_01 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   ZIPFFREQ,              # & the predictor ZIPFFREQ (which involves logging)
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + ZIPFFREQ, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.21421 -0.30499 -0.07338  0.22546  2.75795 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.84212    0.04303  228.72   <2e-16 ***
ZIPFFREQ    -0.11995    0.00927  -12.94   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.447 on 7750 degrees of freedom
Multiple R-squared:  0.02115,   Adjusted R-squared:  0.02102 
F-statistic: 167.4 on 1 and 7750 DF,  p-value: < 2.2e-16

Confint(m_01)

              Estimate      2.5 %     97.5 %
(Intercept)  9.8421186  9.7577652  9.9264721
ZIPFFREQ    -0.1199524 -0.1381246 -0.1017802

Is this an acceptable model (in terms of diagnostics)? It’s not that bad, I suppose …

par(mfrow=c(2, 2)) # make the plotting window have 2 rows & 2 columns
hist(d$RT_log, breaks="FD") # a histogram of the response
hist(residuals(m_01), main="Residuals") # a histogram of the residuals
plot(m_01, which=1:2)   # plot model-diagnostic plot 1
par(mfrow=c(1, 1)) # reset to default

Now, first again, is the predictor ZIPFFREQ significant? This is easy here because the model has just this one predictor so the p-value of the whole model is the p-value of its only predictor. Also, the predictor has 1 df (because it is numeric), which, means that the p-value in the coefficients table also tells us what the p-value for ‘all of ZIPFFREQ’ is.

Second, note the equivalence of the classical Pearson’s r and the linear model we fit:

the t-statistic from Pearson’s r (-12.9394984) squared is m_01’s F-value (167.4306191);
the t-statistic from Pearson’s r (-12.9394984) is also the t-statistic of ZIPFFREQ (-12.9394984);
the df of Pearson’s r (7750) is also the df₂ of m_01 (7750);
the value of Pearson’s r (-0.1454204) squared is m_01’s Multiple R-squared (0.0211471).

In this case, there aren’t means to estimate for levels of the predictor simply because the predictor has no levels: it is numeric. But we can informally think about this as a continuous extension of how we dealt with predictors so far, especially the binary case. There, one level (the first one, spanish) was considered 0/FALSE and mapped onto the intercept. the other (english) was considered 1/TRUE. But here, it’s a tiny bit different – look at the model.matrix:

picky <- c(215, 35)
d[picky, c("RT_log", "ZIPFFREQ")]

      RT_log ZIPFFREQ
216 9.147205        3
35  9.228819        4

model.matrix(m_01) %>% "["(picky,)

    (Intercept) ZIPFFREQ
216           1        3
35            1        4

In model.matrix(m_01), the 2nd column is now just the value of the numeric predictor and otherwise the logic from above is the same. The coefficients …

coef(m_01)

(Intercept)    ZIPFFREQ 
  9.8421186  -0.1199524

… are used like this: For each case x, you

multiply the coefficients with row x from the model matrix;
sum up the result,

and that is the prediction for that case:

"*"( # prediction for case 215:
   coef(m_01),
   model.matrix(m_01)[215,]) %>% sum

[1] 9.482261

"*"( # prediction for case 35:
   coef(m_01),
   model.matrix(m_01)[35,]) %>% sum

[1] 9.362309

In other words,

we use the intercept for when the numeric predictor is 0;
we use the coefficient to represent how much the prediction changes when the value of the numeric predictor changes from from 0/FALSE to 1/TRUE, which makes it a slope, and this slope is significantly different from 0.

Thus, this is what this model predicts:

d$PRED <- predict(m_01)
d[picky,]

    CASE   RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE  RT FREQPMW CORRECT
216  234 9.147205 heritage    c      5        3  spanish 567       1 correct
35    36 9.228819 heritage    b      3        4  spanish 600      10 correct
        PRED
216 9.482261
35  9.362309

How good is this model? For that we need the deviance again:

sum((d$RT_log - d$PRED)^2, na.rm=TRUE)

[1] 1548.498

deviance(m_01)

[1] 1548.498

How much is this an improvement in % of the original deviance?

"-"(
   deviance(m_00),
   deviance(m_01)) / deviance(m_00)

[1] 0.02114709

What is that again? R-squared.

Let’s again compute effects predictions for this predictor, but we do so in a way that can be didactically useful: we do not let the effects package guess a few values, we define a bunch of ‘relevant’ values ourselves:

(relevants_z <- c(
   0:1, # these help us understand the slope
   3:6, # these cover the range of the actual values
   mean(d$ZIPFFREQ)) %>% sort) # the central tendency

[1] 0.000000 1.000000 3.000000 4.000000 4.609476 5.000000 6.000000

Then, we compute the effects predictions of the model for these relevant values:

(zpf_d <- data.frame( # make zpf_d a data frame of
   zpf <- effect(     # the effect
      "ZIPFFREQ",     # of ZIPFFREQ
      m_01,           # in m_01
      # but predictions for our relevant values:
      xlevels=list(ZIPFFREQ=relevants_z))))

  ZIPFFREQ      fit          se    lower    upper
1 0.000000 9.842119 0.043031556 9.757765 9.926472
2 1.000000 9.722166 0.033843723 9.655823 9.788509
3 3.000000 9.482261 0.015760356 9.451367 9.513156
4 4.000000 9.362309 0.007595874 9.347419 9.377199
5 4.609476 9.289201 0.005076888 9.279249 9.299153
6 5.000000 9.242356 0.006235465 9.230133 9.254580
7 6.000000 9.122404 0.013854241 9.095246 9.149562

# or this:
# (zpf_d <- data.frame( # make zpf_d a data frame of
#    zpf <- predictorEffect( # the effect
#       "ZIPFFREQ",          # of ZIPFFREQ
#       m_01,                # in m_01
#       # but predictions for our relevant values:
#       focal.levels=relevants_z)))

1.4.2 Visual interpretation

Let’s visualize the nature of the effect of ZIPFFREQ based on the predictions from effects; note, this one is a bit annoying now because we used the didactically useful vector relevants_z, which now results in an x-axis that covers the interpretationally useless value range [0, 3):

plot(zpf,                            # plot the effect of ZIPFFREQ in m_02
   xlab="Zipf frequency",            # w/ this x-axis label
   ylab="RT in ms (logged)",         # w/ this y-axis label
   ylim=range(d$RT_log, na.rm=TRUE), # w/ these y-axis limits
   grid=TRUE)                        # & w/ a grid

Figure 18: The effect of ZIPFFREQ in m_01

So here is a version with everything: observed values, predicted values, and the (barely visible) confidence band of the predictions/regression line:

Figure 19: The effect of ZIPFFREQ in m_01

1.4.3 Write-up

To determine whether the reaction time to a word (in ms) varies as a function of the frequency of that word (normalized to per million words), a linear model was fit with RT_log as the response variable and ZIPFFREQ as a predictor (because exploration of FREQPMW indicated a very skewed distribution of FREQPMW, which is why the log-transformed version of this predictor was used in its place). The model was highly significant (F=167.4, df₁=1, df₂=7750, p<0.0001) but explains very little of the response variable (mult. R²=0.021, adj. R²=0.021). The coefficients indicate that ZIPFFREQ is negatively correlated with RT_log, as shown in the summary table and the figure shown here. [Show a plot]

	Estimate	95%-CI	se	t	p_2-tailed
Intercept (ZIPFFREQ=0)	9.842	[9.758, 9.926]	0.043	228.72	<0.001
ZIPFFREQ_0→1	-0.12	[-0.138, -0.102]	0.009	-12.94	<0.001

Some housekeeping:

str(d <- d[,1:10])

'data.frame':   7752 obs. of  10 variables:
 $ CASE    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ RT_log  : num  9.55 8.67 8.92 10.31 9.31 ...
 $ GROUP   : Factor w/ 2 levels "english","heritage": 2 2 2 2 2 2 2 2 2 2 ...
 $ SITE    : Factor w/ 3 levels "a","b","c": 2 2 2 2 2 2 3 2 2 1 ...
 $ LENGTH  : int  5 5 6 4 6 6 8 4 5 6 ...
 $ ZIPFFREQ: num  4.28 4.89 5.37 5.12 4.15 ...
 $ LANGUAGE: Factor w/ 2 levels "spanish","english": 1 1 1 1 1 1 1 1 1 1 ...
 $ RT      : int  748 408 483 1266 636 395 765 1297 849 315 ...
 $ FREQPMW : int  19 78 233 133 14 67 34 114 2 20 ...
 $ CORRECT : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...

1.4.4 Just to really drive this home …

This is what one would have done with the unprepared response and predictor:

par(mfrow=c(1,2))
plot(
   main="RT in ms ~ freq pmw",
   pch=16, col="#00000030",
   xlab="Frequency pmw", xlim=c(  0, 1200)   , x=d$FREQPMW,
   ylab="RT in ms"     , ylim=c(417, 1137.45), y=d$RT)
abline(lm(RT ~ FREQPMW, data=d), col="red", lwd=2)
abline(h=quantiles(d$RT)     , lty=3, col="#00FF0080")
abline(v=quantiles(d$FREQPMW), lty=3, col="#00FF0080")
hist(residuals(lm(RT ~ FREQPMW, data=d)), breaks="FD", main="", xlab="Residuals")
par(mfrow=c(1,1))

Figure 20: What the effect of/diagnostics for FREQPMW would have been

… and this is the better thing we did:

par(mfrow=c(1,2))
plot(
   main="RT in ms (logged) ~  of Zipf scale",
   pch=16, col="#00000030",
   xlab="Zipf scaled"       , xlim=c(3,  6), x=d$ZIPFFREQ,
   ylab="RT in ms (log2ged)", ylim=c(8, 12), y=d$RT_log)
abline(lm(RT_log ~ ZIPFFREQ, data=d), col="red", lwd=2)
abline(h=quantiles(d$RT_log)  , lty=3, col="#00FF0080")
abline(v=quantiles(d$ZIPFFREQ), lty=3, col="#00FF0080")
hist(residuals(lm(RT_log ~ ZIPFFREQ, data=d)), breaks="FD", main="", xlab="Residuals")
par(mfrow=c(1,1))

1.5 Two categorical predictors

Let us now extend this to a scenario where we have two categorical predictors. Imagine we’re asking, does the logged reaction time to a word (in ms) vary as a function of

the language that the stimulus word was presented in: LANGUAGE: spanish vs. english;
the site where the data were collected: SITE: a vs. b vs. c.

table(d$LANGUAGE, d$SITE)

         
             a    b    c
  spanish 1094 1401 1280
  english 1309 1414 1254

by(
   d$RT_log,
   list(d$LANGUAGE, d$SITE),
   summary)

: spanish
: a
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.243   9.056   9.301   9.305   9.564  10.051 
------------------------------------------------------------ 
: english
: a
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.082   8.873   9.061   9.095   9.331  10.050 
------------------------------------------------------------ 
: spanish
: b
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.539   9.140   9.331   9.427   9.697  10.359 
------------------------------------------------------------ 
: english
: b
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.520   8.928   9.150   9.197   9.385  10.353 
------------------------------------------------------------ 
: spanish
: c
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.539   9.103   9.401   9.527   9.739  12.012 
------------------------------------------------------------ 
: english
: c
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.520   8.899   9.095   9.184   9.371  11.341

Let’s also visualize:

# the predictor(s) w/ the response
boxplot(main="RT in ms (logged) ~  LANGUAGE + SITE", varwidth=TRUE,
   d$RT_log ~ d$LANGUAGE+d$SITE, notch=TRUE,
   xlab="Predictor combinations",
   ylab="RT in ms (logged)"     , ylim=c(8, 12)); grid()

Figure 22: RT (logged) as a function of LANGUAGE and SITE

1.5.1 Modeling & numerical interpretation

Let’s fit a model:

summary(m_01 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   LANGUAGE + SITE,       # & the predictors
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + LANGUAGE + SITE, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.09009 -0.29309 -0.06378  0.22331  2.52552 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      9.333261   0.010196 915.406   <2e-16 ***
LANGUAGEenglish -0.261055   0.009716 -26.869   <2e-16 ***
SITEb            0.109625   0.011872   9.234   <2e-16 ***
SITEc            0.153143   0.012173  12.580   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4272 on 7748 degrees of freedom
Multiple R-squared:  0.1062,    Adjusted R-squared:  0.1059 
F-statistic: 306.9 on 3 and 7748 DF,  p-value: < 2.2e-16

Confint(m_01)

                  Estimate       2.5 %     97.5 %
(Intercept)      9.3332611  9.31327465  9.3532475
LANGUAGEenglish -0.2610548 -0.28010081 -0.2420088
SITEb            0.1096246  0.08635282  0.1328964
SITEc            0.1531434  0.12928028  0.1770066

As before: are all 2 predictors significant? Here, this is more difficult: LANGUAGE has 1 df so its p-value is clear from the coefficients table, but SITE has >2 df so its p-value is not clear, and we cannot glean in from the model’s overall p-value, because this one includes whatever LANGUAGE contributes. What can we do? The long answer goes back to model comparison: Above we said this:

a method called **model comparison**: If you have two models that were fit to
the same data set and where one model is a sub-model of the other model --
typically such that they differ only by one predictor/term -- then you can use
the function `anova` to compare the two models to see whether the more complex
model can justify its higher degree of complexity by being significantly better
than the less complex model.

So we use anova again:

summary(m_02l <- lm(      # make/summarize the linear model m_02l:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   LANGUAGE       ,       # & only the predictor LANGUAGE
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + LANGUAGE, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.18270 -0.28696 -0.06671  0.21618  2.58605 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      9.425872   0.007027 1341.44   <2e-16 ***
LANGUAGEenglish -0.266402   0.009810  -27.16   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4317 on 7750 degrees of freedom
Multiple R-squared:  0.08688,   Adjusted R-squared:  0.08677 
F-statistic: 737.4 on 1 and 7750 DF,  p-value: < 2.2e-16

anova(                      # compare
   m_01,                    # this model
   m_02l,                   # to this one
   test="F") %>% data.frame # with an F-test

  Res.Df      RSS Df Sum.of.Sq        F       Pr..F.
1   7748 1413.909 NA        NA       NA           NA
2   7750 1444.505 -2 -30.59622 83.83123 9.569434e-37

# better:
summary(m_02l <- update( # make m_02l an update
   m_01, .~.             # of m_01, all of it
   - SITE))              # but then not SITE


Call:
lm(formula = RT_log ~ LANGUAGE, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.18270 -0.28696 -0.06671  0.21618  2.58605 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      9.425872   0.007027 1341.44   <2e-16 ***
LANGUAGEenglish -0.266402   0.009810  -27.16   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4317 on 7750 degrees of freedom
Multiple R-squared:  0.08688,   Adjusted R-squared:  0.08677 
F-statistic: 737.4 on 1 and 7750 DF,  p-value: < 2.2e-16

Clearly, SITE makes a hugely significant contribution to m_01. Let’s do this for LANGUAGE as well, just for practice and completeness:

summary(m_02s <- update( # make m_02s an update
   m_01, .~.             # of m_01, all of it
   - LANGUAGE))          # but then not LANGUAGE


Call:
lm(formula = RT_log ~ SITE, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.10891 -0.30613 -0.08053  0.23524  2.65471 

Coefficients:
            Estimate Std. Error  t value Pr(>|t|)    
(Intercept) 9.191055   0.009111 1008.810   <2e-16 ***
SITEb       0.120700   0.012404    9.731   <2e-16 ***
SITEc       0.166161   0.012717   13.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4466 on 7749 degrees of freedom
Multiple R-squared:  0.02295,   Adjusted R-squared:  0.0227 
F-statistic:    91 on 2 and 7749 DF,  p-value: < 2.2e-16

anova(                      # compare
   m_01,                    # this model
   m_02s,                   # to this one
   test="F") %>% data.frame # with an F-test

  Res.Df      RSS Df Sum.of.Sq        F        Pr..F.
1   7748 1413.909 NA        NA       NA            NA
2   7749 1545.650 -1 -131.7407 721.9181 4.054422e-152

Thankfully, there is a shorter way to do check the significance of droppable predictors: drop1 returns what we had to create with anova stepwise:

drop1(m_01, test="F")

Single term deletions

Model:
RT_log ~ 1 + LANGUAGE + SITE
         Df Sum of Sq    RSS    AIC F value    Pr(>F)    
<none>                1413.9 -13183                      
LANGUAGE  1   131.741 1545.7 -12494 721.918 < 2.2e-16 ***
SITE      2    30.596 1444.5 -13021  83.831 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# in the future, we will make the output of drop1 data frames to reduce the output

So, what do the coefficients mean, for which we return to the data and the model matrix:

picky <- c(10,9,7,101,103,93)
d[picky,c("RT_log", "LANGUAGE", "SITE")]

      RT_log LANGUAGE SITE
10  8.299208  spanish    a
9   9.729621  spanish    b
7   9.579316  spanish    c
101 9.066089  english    a
103 9.100662  english    b
93  8.836050  english    c

model.matrix(m_01) %>% "["(picky,)

    (Intercept) LANGUAGEenglish SITEb SITEc
10            1               0     0     0
9             1               0     1     0
7             1               0     0     1
101           1               1     0     0
103           1               1     1     0
93            1               1     0     1

The model matrix shows that we basically now have a combination of what we saw for each categorical predictor above and we can see that

the intercept is the prediction for when
- LANGUAGE is spanish and
- SITE is a (see case 10);
coefficient 2 is how we get from the intercept to the prediction for when
- LANGUAGE changes from spanish to english, but
- SITE isn’t mentioned, it remains the intercept value a (see case 101);
coefficient 3 is how we get from the intercept to the prediction for when
- LANGUAGE isn’t mentioned, it remains the intercept value spanish , but
- SITE changes from a to b (see case 9);
coefficient 4 is how we get from the intercept to the prediction for when
- LANGUAGE isn’t mentioned, it remains the intercept value spanish , but
- SITE changes from a to c (see case 7).

But with these 4 coefficients, we can also make predictions for when both LANGUAGE changes to english AND SITE changes to b or c; see case 103 and 93 respectively, where we essentially just add up the separate effects of LANGUAGE and SITE. The coefficients are therefore used like above:

"*"( # prediction for case 103 (LANGUAGE: english, SITE: b):
   coef(m_01),
   model.matrix(m_01)[103,]) %>% sum

[1] 9.181831

"*"( # prediction for case 93 (LANGUAGE: english, SITE: c):
   coef(m_01),
   model.matrix(m_01)[93,]) %>% sum

[1] 9.22535

Thus, this is what this model predicts and we again add it to the data frame d:

d$PRED <- predict(m_01)
d[picky,]

    CASE   RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE  RT FREQPMW CORRECT
10    10 8.299208 heritage    a      6 4.301030  spanish 315      20 correct
9      9 9.729621 heritage    b      5 3.301030  spanish 849       2 correct
7      7 9.579316 heritage    c      8 4.531479  spanish 765      34 correct
101  110 9.066089 heritage    a      6 3.477121  english 536       3 correct
103  112 9.100662 heritage    b      4 4.579784  english 549      38 correct
93   101 8.836050 heritage    c      6 4.869232  english 457      74 correct
        PRED
10  9.333261
9   9.442886
7   9.486405
101 9.072206
103 9.181831
93  9.225350

How good is this model? Same as above:

deviance(m_01) # sum((d$RT - d$PRED)^2, na.rm=TRUE)

[1] 1413.909

"-"(
   deviance(m_00),
   deviance(m_01)) / deviance(m_00)

[1] 0.1062247

Let’s compute emmeans and effects results again: We begin with what emmeans does if you force it to produce all six means at the same time:

data.frame(qwe <- emmeans( # compute estimated marginal means
   object=m_01,              # for the model m_01
   specs= ~ LANGUAGE:SITE)) # namely all combinations of the interaction

  LANGUAGE SITE   emmean          SE   df lower.CL upper.CL
1  spanish    a 9.333261 0.010195760 7748 9.313275 9.353248
2  english    a 9.072206 0.009772786 7748 9.053049 9.091364
3  spanish    b 9.442886 0.009415164 7748 9.424429 9.461342
4  english    b 9.181831 0.009391984 7748 9.163420 9.200242
5  spanish    c 9.486405 0.009753654 7748 9.467285 9.505524
6  english    c 9.225350 0.009803181 7748 9.206133 9.244567

But let’s now look at the effects of the predictors individually. First with emmeans again:

emmeans(m_01, specs= ~ LANGUAGE) %>% data.frame

  LANGUAGE   emmean          SE   df lower.CL upper.CL
1  spanish 9.420850 0.006969945 7748 9.407187 9.434513
2  english 9.159796 0.006778074 7748 9.146509 9.173082

emmeans(m_01, specs= ~ SITE)     %>% data.frame

  SITE   emmean          SE   df lower.CL upper.CL
1    a 9.202734 0.008725265 7748 9.185630 9.219838
2    b 9.312358 0.008051530 7748 9.296575 9.328141
3    c 9.355877 0.008486334 7748 9.339242 9.372513

What are those? Simple: The predictions emmeans gives you for one predictor are the averages of the predictions for that predictor over the levels of the other:

# the results of emmeans(m_01, specs= ~ LANGUAGE)
data.frame(qwe)[c(1,3,5),"emmean"] %>% mean

[1] 9.42085

data.frame(qwe)[c(2,4,6),"emmean"] %>% mean

[1] 9.159796

# the results of emmeans(m_01, specs= ~ SITE)
data.frame(qwe)[1:2,"emmean"] %>% mean

[1] 9.202734

data.frame(qwe)[3:4,"emmean"] %>% mean

[1] 9.312358

data.frame(qwe)[5:6,"emmean"] %>% mean

[1] 9.355877

But what is more appropriate is the pairwise differences for each predictor:

emmeans(m_01, specs= ~ LANGUAGE) %>% pairs(adjust="none")

 contrast          estimate      SE   df t.ratio p.value
 spanish - english    0.261 0.00972 7748  26.869  <.0001

Results are averaged over the levels of: SITE

emmeans(m_01, specs= ~ SITE)     %>% pairs(adjust="none")

 contrast estimate     SE   df t.ratio p.value
 a - b     -0.1096 0.0119 7748  -9.234  <.0001
 a - c     -0.1531 0.0122 7748 -12.580  <.0001
 b - c     -0.0435 0.0117 7748  -3.720  0.0002

Results are averaged over the levels of: LANGUAGE

Ok, now the effects predictions, which will be more interesting …:

emmeans(m_01, specs= ~ LANGUAGE) # reminder of what emmeans said:

 LANGUAGE emmean      SE   df lower.CL upper.CL
 spanish   9.421 0.00697 7748    9.407    9.435
 english   9.160 0.00678 7748    9.147    9.173

Results are averaged over the levels of: SITE 
Confidence level used: 0.95

(lan_d <- data.frame( # make lan_d a data frame of
   lan <- effect(     # the effect
      "LANGUAGE",     # of LANGUAGE
      m_01)))         # in m_01

  LANGUAGE      fit          se    lower    upper
1  spanish 9.423129 0.006956061 9.409494 9.436765
2  english 9.162075 0.006776940 9.148790 9.175359

emmeans(m_01, specs= ~ SITE) # reminder of what emmeans said:

 SITE emmean      SE   df lower.CL upper.CL
 a     9.203 0.00873 7748    9.186    9.220
 b     9.312 0.00805 7748    9.297    9.328
 c     9.356 0.00849 7748    9.339    9.373

Results are averaged over the levels of: LANGUAGE 
Confidence level used: 0.95

(sit_d <- data.frame(      # make sit_d a data frame of
   sit <- predictorEffect( # the effect
      "SITE",              # of SITE
      m_01)))              # in m_01

  SITE      fit          se    lower    upper
1    a 9.199332 0.008719875 9.182239 9.216426
2    b 9.308957 0.008052173 9.293173 9.324741
3    c 9.352476 0.008488022 9.335837 9.369115

?! How do these effects results relate to the values from emmeans? So, there are three different ways in which R reports results and you have to know what you want:

the coefficients in the summary output are based on everything categorical being its first level (and everything numeric being 0) and then you get the differences between means (and slopes);
emmeans default predictions are based on averages for the predictions for PREDICTOR over the levels of another predictor (and everything numeric being the mean; see below);
effects default predictions are based on averages for the predictions for PREDICTOR over the levels of another predictor but weighted by the frequencies of these other levels (and everything numeric being the mean; see below)!

Thus, this should work:

lan_d

  LANGUAGE      fit          se    lower    upper
1  spanish 9.423129 0.006956061 9.409494 9.436765
2  english 9.162075 0.006776940 9.148790 9.175359

data.frame(qwe)[c(1,3,5),"emmean"] %>% weighted.mean(table(model.frame(m_01)$SITE))

[1] 9.423129

data.frame(qwe)[c(2,4,6),"emmean"] %>% weighted.mean(table(model.frame(m_01)$SITE))

[1] 9.162075

And then the same for SITE:

sit_d

  SITE      fit          se    lower    upper
1    a 9.199332 0.008719875 9.182239 9.216426
2    b 9.308957 0.008052173 9.293173 9.324741
3    c 9.352476 0.008488022 9.335837 9.369115

data.frame(qwe)[1:2,"emmean"] %>% weighted.mean(table(model.frame(m_01)$LANGUAGE))

[1] 9.199332

data.frame(qwe)[3:4,"emmean"] %>% weighted.mean(table(model.frame(m_01)$LANGUAGE))

[1] 9.308957

data.frame(qwe)[5:6,"emmean"] %>% weighted.mean(table(model.frame(m_01)$LANGUAGE))

[1] 9.352476

Very cool …

Note: you can get effects kind of results with the emmeans package, too, if you give emmeans an additional argument weights="outer" (because its default is weights="equal"). And you can get emmeans results with the effects package, too, if you give effects an additional argument fixed.predictors=list(given.values="equal") (because its default is fixed.predictors=list(given.values="default")).)

1.5.2 Visual interpretation

Let’s visualize each predictor’s effect:

plot(sit,                    # plot the effect of SITE
   xlab="Site",              # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

plot(lan,                    # plot the effect of LANGUAGE
   xlab="Language",          # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

Figure 24: The effect of LANGUAGE in m_01

1.5.3 Modeling with interactions

Let’s look at the predictions of the model again:

qwe

 LANGUAGE SITE emmean      SE   df lower.CL upper.CL
 spanish  a     9.333 0.01020 7748    9.313    9.353
 english  a     9.072 0.00977 7748    9.053    9.091
 spanish  b     9.443 0.00942 7748    9.424    9.461
 english  b     9.182 0.00939 7748    9.163    9.200
 spanish  c     9.486 0.00975 7748    9.467    9.506
 english  c     9.225 0.00980 7748    9.206    9.245

Confidence level used: 0.95

Let’s display this a little more conveniently for a moment:

# with the data frame version of qwe
with(data.frame(qwe),
   # apply to the predictions in this column
   tapply(emmean,
      # a grouping by the combinations of these two columns
      list(SITE, LANGUAGE),
      # and list the 1 value you get there
      c))

   spanish  english
a 9.333261 9.072206
b 9.442886 9.181831
c 9.486405 9.225350

The critical thing to realize here is that

the effect of LANGUAGE: spanish→english is the same everywhere, i.e. regardless of whether SITE is a, b, or c;
the effect of SITE: a→b is the same everywhere, i.e. regardless of whether LANGUAGE is spanish or english;
the effect of SITE: b→c is the same everywhere, i.e. regardless of whether LANGUAGE is spanish or english.

For pretty obvious reasons, this is called additive behavior. But what if additivity is not a realist or accurate assumption? Maybe the effect of LANGUAGE (or SITE) is not the same regardless of the value of SITE (or LANGUAGE). If that’s the case, we call this an interaction: something doesn’t do the same everywhere/always, one of the most important notions you can learn.

How do we check for this in R? Like this:

summary(m_02 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   LANGUAGE + SITE +      # & the predictors
   LANGUAGE:SITE,         # & their interaction <----------------------------------------------
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + LANGUAGE + SITE + LANGUAGE:SITE, data = d, 
    na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.06224 -0.28150 -0.06249  0.22645  2.48487 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)            9.30541    0.01289 722.057  < 2e-16 ***
LANGUAGEenglish       -0.20993    0.01746 -12.023  < 2e-16 ***
SITEb                  0.12208    0.01720   7.098 1.37e-12 ***
SITEc                  0.22164    0.01755  12.629  < 2e-16 ***
LANGUAGEenglish:SITEb -0.02048    0.02373  -0.863    0.388    
LANGUAGEenglish:SITEc -0.13327    0.02433  -5.479 4.42e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4263 on 7746 degrees of freedom
Multiple R-squared:  0.1103,    Adjusted R-squared:  0.1098 
F-statistic: 192.1 on 5 and 7746 DF,  p-value: < 2.2e-16

Confint(m_02)

                         Estimate       2.5 %      97.5 %
(Intercept)            9.30541146  9.28014876  9.33067417
LANGUAGEenglish       -0.20992980 -0.24415822 -0.17570138
SITEb                  0.12208024  0.08836735  0.15579314
SITEc                  0.22164497  0.18724048  0.25604947
LANGUAGEenglish:SITEb -0.02047856 -0.06699429  0.02603716
LANGUAGEenglish:SITEc -0.13327174 -0.18095638 -0.08558711

Is the interaction significant? We can test this again the (slightly) long(er) or the shorter way:

anova(       # compare          this
   m_01,     # this model       is the
   m_02,     # to this one      long way
   test="F") # with an F-test

Analysis of Variance Table

Model 1: RT_log ~ 1 + LANGUAGE + SITE
Model 2: RT_log ~ 1 + LANGUAGE + SITE + LANGUAGE:SITE
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1   7748 1413.9                                  
2   7746 1407.4  2    6.4931 17.868 1.811e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# in the future, we will make the output of anova data frames to reduce the output
drop1(m_02, test="F") %>% data.frame # the short way

              Df Sum.of.Sq      RSS       AIC  F.value       Pr..F.
<none>        NA        NA 1407.416 -13214.43       NA           NA
LANGUAGE:SITE  2  6.493064 1413.909 -13182.75 17.86795 1.810901e-08

In other words, these predictions from m_01 are significantly worse than what we could report. But how do we get new predictions now? As always: from the coefficients and the model matrix:

d[picky,c("RT_log", "LANGUAGE", "SITE")]

      RT_log LANGUAGE SITE
10  8.299208  spanish    a
9   9.729621  spanish    b
7   9.579316  spanish    c
101 9.066089  english    a
103 9.100662  english    b
93  8.836050  english    c

model.matrix(m_02) %>% "["(picky,)

    (Intercept) LANGUAGEenglish SITEb SITEc LANGUAGEenglish:SITEb
10            1               0     0     0                     0
9             1               0     1     0                     0
7             1               0     0     1                     0
101           1               1     0     0                     0
103           1               1     1     0                     1
93            1               1     0     1                     0
    LANGUAGEenglish:SITEc
10                      0
9                       0
7                       0
101                     0
103                     0
93                      1

The model matrix shows that we basically now have a combination of what we saw for each categorical predictor above and we can see that

the intercept is the prediction for when
- LANGUAGE is spanish and
- SITE is a (see case 10);
coefficient 2 is how we get from the intercept to the prediction for when
- LANGUAGE changes from spanish to english, but
- SITE isn’t mentioned, it remains the intercept value a (see case 101);
coefficient 3 is how we get from the intercept to the prediction for when
- LANGUAGE isn’t mentioned, it remains the intercept value spanish, but
- SITE changes from a to b (see case 9);
coefficient 4 is how we get from the intercept to the prediction for when
- LANGUAGE isn’t mentioned, it remains the intercept value spanish, but
- SITE changes from a to c (see case 7).

But now we do not just have additive effects, which means the predictions for LANGUAGE: english and SITE: b as well as for LANGUAGE: english and SITE: c are different in how they require an additional addition, the interaction term:

coefficient 5 is what you add to the intercept, the main effect of LANGUAGE: spanish→english, and the main effect of SITE: a→b when these predictor values coincide (see case 103);
coefficient 6 is what you add to the intercept, the main effect of LANGUAGE: spanish→english, and the main effect of SITE: a→c when these predictor values coincide (see case 93).

The coefficients are therefore used like above:

"*"( # prediction for case 101 (LANGUAGE: english, SITE: a):
   coef(m_02),
   model.matrix(m_02)[101,]) %>% sum

[1] 9.095482

# same as 9.30541 + -0.20993
"*"( # prediction for case 103 (LANGUAGE: english, SITE: b):
   coef(m_02),
   model.matrix(m_02)[103,]) %>% sum

[1] 9.197083

# same as 9.30541 + -0.20993 + 0.12208 + -0.02048

So let’s generate all predictions again and also add them to the data frame d:

d$PRED <- predict(m_02)

In a case like this – the interaction includes all predictors – all the results from emmeans …

emmeans(m_02, specs= ~ LANGUAGE | SITE) %>% data.frame

  LANGUAGE SITE   emmean         SE   df lower.CL upper.CL
1  spanish    a 9.305411 0.01288736 7746 9.280149 9.330674
2  english    a 9.095482 0.01178156 7746 9.072387 9.118577
3  spanish    b 9.427492 0.01138816 7746 9.405168 9.449816
4  english    b 9.197083 0.01133569 7746 9.174862 9.219304
5  spanish    c 9.527056 0.01191427 7746 9.503701 9.550412
6  english    c 9.183855 0.01203715 7746 9.160259 9.207451

emmeans(m_02, specs= ~ SITE | LANGUAGE) %>% data.frame

  SITE LANGUAGE   emmean         SE   df lower.CL upper.CL
1    a  spanish 9.305411 0.01288736 7746 9.280149 9.330674
2    b  spanish 9.427492 0.01138816 7746 9.405168 9.449816
3    c  spanish 9.527056 0.01191427 7746 9.503701 9.550412
4    a  english 9.095482 0.01178156 7746 9.072387 9.118577
5    b  english 9.197083 0.01133569 7746 9.174862 9.219304
6    c  english 9.183855 0.01203715 7746 9.160259 9.207451

emmeans(m_02, specs= ~ LANGUAGE:SITE) %>% data.frame

  LANGUAGE SITE   emmean         SE   df lower.CL upper.CL
1  spanish    a 9.305411 0.01288736 7746 9.280149 9.330674
2  english    a 9.095482 0.01178156 7746 9.072387 9.118577
3  spanish    b 9.427492 0.01138816 7746 9.405168 9.449816
4  english    b 9.197083 0.01133569 7746 9.174862 9.219304
5  spanish    c 9.527056 0.01191427 7746 9.503701 9.550412
6  english    c 9.183855 0.01203715 7746 9.160259 9.207451

and effects are the same:

(lasi_d <- data.frame( # make lasi_d a data frame of
   lasi <- effect(     # the effect
      "LANGUAGE:SITE", # of LANGUAGE:SITE
      m_02)))          # in m_02

  LANGUAGE SITE      fit         se    lower    upper
1  spanish    a 9.305411 0.01288736 9.280149 9.330674
2  english    a 9.095482 0.01178156 9.072387 9.118577
3  spanish    b 9.427492 0.01138816 9.405168 9.449816
4  english    b 9.197083 0.01133569 9.174862 9.219304
5  spanish    c 9.527056 0.01191427 9.503701 9.550412
6  english    c 9.183855 0.01203715 9.160259 9.207451

But note of course that emmeans can give you nice post-hoc tests:

emmeans(m_02, specs= ~ LANGUAGE | SITE) %>% pairs(adjust="none") %>% data.frame

           contrast SITE  estimate         SE   df  t.ratio      p.value
1 spanish - english    a 0.2099298 0.01746107 7746 12.02273 5.302321e-33
2 spanish - english    b 0.2304084 0.01606823 7746 14.33938 4.808368e-46
3 spanish - english    c 0.3432015 0.01693644 7746 20.26409 5.255320e-89

emmeans(m_02, specs= ~ SITE | LANGUAGE) %>% pairs(adjust="none") %>% data.frame

  contrast LANGUAGE    estimate         SE   df     t.ratio      p.value
1    a - b  spanish -0.12208024 0.01719808 7746  -7.0984794 1.373008e-12
2    a - c  spanish -0.22164497 0.01755089 7746 -12.6287005 3.330301e-36
3    b - c  spanish -0.09956473 0.01648151 7746  -6.0409975 1.602418e-09
4    a - b  english -0.10160168 0.01634940 7746  -6.2143982 5.418688e-10
5    a - c  english -0.08837323 0.01684334 7746  -5.2467759 1.588849e-07
6    b - c  english  0.01322845 0.01653453 7746   0.8000496 4.237066e-01

emmeans(m_02, specs= ~ LANGUAGE:SITE)   %>% pairs(adjust="none") %>% data.frame # the desperation !

                contrast    estimate         SE   df     t.ratio       p.value
1  spanish a - english a  0.20992980 0.01746107 7746  12.0227322  5.302321e-33
2  spanish a - spanish b -0.12208024 0.01719808 7746  -7.0984794  1.373008e-12
3  spanish a - english b  0.10832812 0.01716338 7746   6.3115829  2.913980e-10
4  spanish a - spanish c -0.22164497 0.01755089 7746 -12.6287005  3.330301e-36
5  spanish a - english c  0.12155657 0.01763454 7746   6.8930955  5.887823e-12
6  english a - spanish b -0.33201004 0.01638582 7746 -20.2620299  5.468649e-89
7  english a - english b -0.10160168 0.01634940 7746  -6.2143982  5.418688e-10
8  english a - spanish c -0.43157477 0.01675574 7746 -25.7568246 1.939396e-140
9  english a - english c -0.08837323 0.01684334 7746  -5.2467759  1.588849e-07
10 spanish b - english b  0.23040836 0.01606823 7746  14.3393775  4.808368e-46
11 spanish b - spanish c -0.09956473 0.01648151 7746  -6.0409975  1.602418e-09
12 spanish b - english c  0.24363681 0.01657055 7746  14.7030002  2.748620e-48
13 english b - spanish c -0.32997309 0.01644529 7746 -20.0648958  2.410363e-87
14 english b - english c  0.01322845 0.01653453 7746   0.8000496  4.237066e-01
15 spanish c - english c  0.34320154 0.01693644 7746  20.2640927  5.255320e-89

1.5.4 Visual interpretation

Let’s visualize the nature of the interaction of the two predictors based on the predictions from effects:

plot(lasi,                   # plot the effect of LANGUAGE:SITE in m_02
   xlab="Site",              # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=range(d$RT_log, na.rm=TRUE), # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid
plot(lasi, x.var="LANGUAGE", # plot the effect of LANGUAGE:SITE in m_02
   xlab="Language",          # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=range(d$RT_log, na.rm=TRUE), # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

Figure 25: The effect of LANGUAGE:SITE in m_01

Figure 26: The effect of LANGUAGE:SITE in m_01

Of the standard effects plots, I think this one is best:

plot(lasi,                     # plot the effect of LANGUAGE:SITE in m_02
   xlab="Site",                # w/ this x-axis label
   ylab="RT in ms (logged)",   # w/ this y-axis label
   ylim=range(d$RT_log, na.rm=TRUE), # w/ these y-axis limits
   multiline=TRUE,             # all data into one panel
   confint=list(style="auto"), # but still confidence intervals
   grid=TRUE)                  # & w/ a grid

Figure 27: The effect of LANGUAGE:SITE in m_01

But this is how I would plot this:

Figure 28: The effect of LANGUAGE:SITE in m_01

1.5.5 Write-up

To determine whether the reaction time to a word (in ms) varies as a function of the language in which the stimulus word was presented and the site at which the experiment was conducted, a linear model was fit with RT_log as the response variable and LANGUAGE as well as SITE and their interaction as a predictor. The model was highly significant (F=192.1, df₁=5, df₂=7746, p<0.0001) but explains only little of the response variable (mult. R²=0.11, adj. R²=0.1098). The coefficients and predictions indicate that

at each site, the (logged) RTs to Spanish words were slower than the RTs to English words:
- by a difference of 0.2 in site a;
- by a difference of 0.23 in site b;
- by a difference of 0.34 in site c;
all combinations of SITE and LANGUAGE but one are significantly different from each other, only RTs to English words do not differ significantly between sites b and c. [Show a plot]

	Estimate	95%-CI	se	t	p_2-tailed
Intercept (LANG=span, SITE=a)	9.305	[9.28, 9.331]	0.013	722.06	<0.001
LANG_span→eng	-0.21	[-0.244, -0.176]	0.017	-12.02	<0.001
SITE_a→b	0.122	[0.088, 0.156]	0.017	7.1	<0.001
SITE_a→c	0.222	[0.187, 0.256]	0.018	12.63	<0.001
LANG_span→eng, SITE_a→b	-0.02	[-0.067, 0.026]	0.024	-0.86	0.388
LANG_span→eng, SITE_a→c	-0.133	[-0.181, -0.086]	0.024	-5.48	<0.001

Some housekeeping:

str(d <- d[,1:10])

'data.frame':   7752 obs. of  10 variables:
 $ CASE    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ RT_log  : num  9.55 8.67 8.92 10.31 9.31 ...
 $ GROUP   : Factor w/ 2 levels "english","heritage": 2 2 2 2 2 2 2 2 2 2 ...
 $ SITE    : Factor w/ 3 levels "a","b","c": 2 2 2 2 2 2 3 2 2 1 ...
 $ LENGTH  : int  5 5 6 4 6 6 8 4 5 6 ...
 $ ZIPFFREQ: num  4.28 4.89 5.37 5.12 4.15 ...
 $ LANGUAGE: Factor w/ 2 levels "spanish","english": 1 1 1 1 1 1 1 1 1 1 ...
 $ RT      : int  748 408 483 1266 636 395 765 1297 849 315 ...
 $ FREQPMW : int  19 78 233 133 14 67 34 114 2 20 ...
 $ CORRECT : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...

1.6 A numeric and a categorical predictor

Now we turn to a scenario where we are asking, does the logged reaction time to a word (in ms) vary as a function of

the language that the stimulus word was presented in: LANGUAGE: spanish vs. english;
the length of the word?

First, we explore a bit again:

table(d$LANGUAGE)


spanish english 
   3775    3977

boxplot(main="RT in ms (logged) ~  LANGUAGE", varwidth=TRUE,
   d$RT_log ~ d$LANGUAGE, notch=TRUE,
   xlab="Language",
   ylab="RT in ms (logged)", ylim=c(8, 12)); grid()

Figure 29: LANGUAGE and RT_log as a function of LANGUAGE

par(mfrow=c(1,2))
hist(d$LENGTH) # log2 or sqrt doesn't help much
plot(pch=16, col="#00000020",
   x=jitter(d$LENGTH),
   y=d$RT_log)
abline(v=quantiles(d$LENGTH), lty=3, col="#FF000040")
abline(h=quantiles(d$RT_log), lty=3, col="#FF000040")
par(mfrow=c(1,1))

Figure 30: LENGTH and RT_log as a function of LENGTH

1.6.1 Modeling & numerical interpretation

Let’s fit a model:

summary(m_01 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   LANGUAGE + LENGTH,     # & the predictors
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + LANGUAGE + LENGTH, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.16687 -0.29052 -0.06813  0.21366  2.56698 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      9.235541   0.026947 342.736  < 2e-16 ***
LANGUAGEenglish -0.249161   0.010057 -24.774  < 2e-16 ***
LENGTH           0.034901   0.004771   7.315 2.84e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4303 on 7749 degrees of freedom
Multiple R-squared:  0.09315,   Adjusted R-squared:  0.09291 
F-statistic:   398 on 2 and 7749 DF,  p-value: < 2.2e-16

Confint(m_01)

                   Estimate       2.5 %      97.5 %
(Intercept)      9.23554072  9.18271822  9.28836322
LANGUAGEenglish -0.24916086 -0.26887582 -0.22944590
LENGTH           0.03490076  0.02554762  0.04425391

As before: are all 2 predictors significant?

drop1(m_01, test="F") %>% data.frame # the short version of

         Df  Sum.of.Sq      RSS       AIC   F.value        Pr..F.
<none>   NA         NA 1434.600 -13072.13        NA            NA
LANGUAGE  1 113.627901 1548.228 -12483.23 613.76174 1.828443e-130
LENGTH    1   9.905381 1444.505 -13020.79  53.50397  2.838880e-13

# anova(m_01, update(m_01, .~. -LANGUAGE), test="F") %>% data.frame
# anova(m_01, update(m_01, .~. -LENGTH)  , test="F") %>% data.frame

So, what do the coefficients mean, for which we return to the data and the model matrix:

picky <- c(35, 4, 178, 96)
d[picky,c("RT_log", "LANGUAGE", "LENGTH")]

       RT_log LANGUAGE LENGTH
35   9.228819  spanish      3
4   10.306062  spanish      4
178  9.233620  english      3
96   8.882643  english      4

model.matrix(m_01) %>% "["(picky,)

    (Intercept) LANGUAGEenglish LENGTH
35            1               0      3
4             1               0      4
178           1               1      3
96            1               1      4

The model matrix shows that we basically now have a combination of what we saw for each categorical predictor above and we can see that

the intercept is the prediction for when
- LANGUAGE is spanish and
- LENGTH is 0 (because then the LENGTH values are multiplied by 0 and don’t change the result);
coefficient 2 is how we get from the intercept to the prediction for when
- LANGUAGE changes from spanish to english, but
- LENGTH isn’t mentioned, it remains the intercept value 0;
coefficient 3 is how we get from the intercept to the prediction for when
- LANGUAGE isn’t mentioned, it remains the intercept value spanish, but
- LENGTH changes from 0 to 1.

The coefficients are therefore used like above:

"*"( # prediction for case 178 (LANGUAGE: english, LENGTH: 3):
   coef(m_01),
   model.matrix(m_01)[178,]) %>% sum

[1] 9.091082

"*"( # prediction for case 96 (LANGUAGE: english, LENGTH: 4):
   coef(m_01),
   model.matrix(m_01)[96,]) %>% sum

[1] 9.125983

Thus, this is what this model predicts and we again add it to the data frame d:

d$PRED <- predict(m_01)
d[picky,]

    CASE    RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE   RT FREQPMW CORRECT
35    36  9.228819 heritage    b      3 4.000000  spanish  600      10 correct
4      4 10.306062 heritage    b      4 5.123852  spanish 1266     133 correct
178  192  9.233620 heritage    b      3 4.653213  english  602      45 correct
96   105  8.882643 heritage    a      4 5.252853  english  472     179 correct
        PRED
35  9.340243
4   9.375144
178 9.091082
96  9.125983

How good is this model? Same as always:

deviance(m_01) # sum((d$RT - d$PRED)^2, na.rm=TRUE)

[1] 1434.6

"-"(
   deviance(m_00),
   deviance(m_01)) / deviance(m_00)

[1] 0.09314538

Let’s compute emmeans and effects predictions again; we begin with the emmeans results for LANGUAGE:

emmeans(m_01, specs= ~ LANGUAGE) %>% data.frame

  LANGUAGE   emmean          SE   df lower.CL upper.CL
1  spanish 9.417027 0.007106631 7749 9.403096 9.430958
2  english 9.167867 0.006918707 7749 9.154304 9.181429

What is that? From what we said above, we can infer that emmeans gives us predictions for each level of LANGUAGE while LENGTH is held constant at the mean (and we will confirm that in a second using effects).

What about the emmeans results for LENGTH, what is that? Here, we get the prediction when LENGTH is held at its mean:

emmeans(m_01, specs= ~ LENGTH) %>% data.frame

    LENGTH   emmean          SE   df lower.CL upper.CL
1 5.200077 9.292447 0.004888682 7749 9.282864  9.30203

However, since LENGTH is a numeric predictor, we do kinda want to see its slope. Here’s a way we can get this: the following code returns the slope of LENGTH for each level of LANGUAGE in the column called LENGTH.trend and they are the same because right now we are not letting the slope of LENGTH vary by level of LANGUAGE (because we have no interaction LANGUAGE:LENGTH in the model):

emtrends(m_01, var="LENGTH") %>% data.frame

  LANGUAGE   LENGTH LENGTH.trend          SE   df
1  spanish 5.200077   0.03490076 0.004771357 7749
2  english 5.200077   0.03490076 0.004771357 7749

What does effects return for LANGUAGE? The same as emmeans: the predictions for the two levels of LANGUAGE while LENGTH is its mean:

(lan_d <- data.frame(      # make lan_d a data frame of
   lan <- predictorEffect( # the effect
      "LANGUAGE",          # of LANGUAGE
      m_01)))              # in m_01

  LANGUAGE      fit          se    lower    upper
1  spanish 9.417027 0.007106631 9.403096 9.430958
2  english 9.167867 0.006918707 9.154304 9.181429

What does effects return for LENGTH (when we again ask for predictions for relevant values)?

(relevants_l <- c(
   0:1,        # these help us understand the slope
   c(3,5,7,9), # these cover the range of the actual values
   mean(d$LENGTH)) %>% sort) # the central tendency

[1] 0.000000 1.000000 3.000000 5.000000 5.200077 7.000000 9.000000

(len_d <- data.frame( # make len_d a data frame of
   len <- effect(     # the effect
      "LENGTH",       # of LENGTH
      m_01,           # in m_01 for the following values
      xlevels=list(LENGTH=relevants_l))))

    LENGTH      fit          se    lower    upper
1 0.000000 9.107714 0.025288119 9.058142 9.157286
2 1.000000 9.142615 0.020627323 9.102180 9.183050
3 3.000000 9.212416 0.011579141 9.189718 9.235115
4 5.000000 9.282218 0.004979295 9.272457 9.291979
5 5.200077 9.289201 0.004886925 9.279621 9.298780
6 7.000000 9.352019 0.009881146 9.332650 9.371389
7 9.000000 9.421821 0.018777847 9.385011 9.458631

Now it’s not the same as emmeans – 9.292447 (from emmeans) is not the same as 9.289201 (from effects) – because, remember, effects gives you predictions that involve weighted averages, with the weights being based on the frequencies of the levels of the so-called non-focal predictors:

# emmeans:
c(
   coef(m_01)[1] + mean(d$LENGTH) * coef(m_01)[3],
   coef(m_01)[1] + mean(d$LENGTH) * coef(m_01)[3] + coef(m_01)[2]) %>% mean

[1] 9.292447

# effects:
c(
   coef(m_01)[1] + mean(d$LENGTH) * coef(m_01)[3],
   coef(m_01)[1] + mean(d$LENGTH) * coef(m_01)[3] + coef(m_01)[2]) %>% weighted.mean(table(d$LANGUAGE))

[1] 9.289201

1.6.2 Visual interpretation

Let’s visualize each predictor’s effect:

plot(lan,                    # plot the effect of LANGUAGE
   xlab="Language",          # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

Figure 31: The effect of LANGUAGE in m_01

plot(len,                    # plot the effect of SITE
   xlab="Length",            # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

1.6.3 Modeling with interactions

As before, the critical thing to realize here is that the effect of LENGTH is the same everywhere, i.e. regardless of whether LANGUAGE is spanish or english. For the same obvious reasons, this is again additive behavior. But it might again not be appropriate or accurate: Maybe the effect of LENGTH is not the same regardless of the value of LANGUAGE. If that’s the case, this is again an interaction: something doesn’t do the same everywhere/always.

How do we check for this in R? Like this:

summary(m_02 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   LANGUAGE + LENGTH +    # & the predictors
   LANGUAGE:LENGTH,       # & their interaction <----------------------------------------------
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + LANGUAGE + LENGTH + LANGUAGE:LENGTH, 
    data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.16406 -0.29017 -0.06899  0.21528  2.56360 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)             9.201792   0.032005 287.515  < 2e-16 ***
LANGUAGEenglish        -0.145808   0.053852  -2.708  0.00679 ** 
LENGTH                  0.041089   0.005726   7.175 7.88e-13 ***
LANGUAGEenglish:LENGTH -0.020223   0.010352  -1.954  0.05079 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4302 on 7748 degrees of freedom
Multiple R-squared:  0.09359,   Adjusted R-squared:  0.09324 
F-statistic: 266.7 on 3 and 7748 DF,  p-value: < 2.2e-16

Is the interaction significant? It is an interaction of two predictors with 1 df so we already see that it’s just about not (unless, that is, you had good reason to expect an interaction term of LANGUAGEenglish:LENGTH with a negative sign, in which case you could divide the p-value of 0.05079 by 2 and score a significant result).

drop1(m_02, test="F") %>% data.frame

                Df Sum.of.Sq      RSS       AIC F.value     Pr..F.
<none>          NA        NA 1433.894 -13073.94      NA         NA
LANGUAGE:LENGTH  1 0.7062722 1434.600 -13072.13 3.81632 0.05079154

anova(m_01, m_02, test="F") %>% data.frame

  Res.Df      RSS Df Sum.of.Sq       F     Pr..F.
1   7749 1434.600 NA        NA      NA         NA
2   7748 1433.894  1 0.7062722 3.81632 0.05079154

Just about not so m_01 is actually the model to go with. For didactic reasons, we will still look at m_02 tho.

First, the coefficients and the model matrix:

d[picky,c("RT_log", "LANGUAGE", "LENGTH")]

       RT_log LANGUAGE LENGTH
35   9.228819  spanish      3
4   10.306062  spanish      4
178  9.233620  english      3
96   8.882643  english      4

model.matrix(m_02) %>% "["(picky,)

    (Intercept) LANGUAGEenglish LENGTH LANGUAGEenglish:LENGTH
35            1               0      3                      0
4             1               0      4                      0
178           1               1      3                      3
96            1               1      4                      4

The model matrix shows that we basically now have a combination of what we saw for each predictor above and we can see that

the intercept is the prediction for when
- LANGUAGE is spanish and
- LENGTH is 0 (because then the LENGTH values are multiplied by 0 and don’t change the result);
coefficient 2 is how we get from the intercept to the prediction for when
- LANGUAGE changes from spanish to english, but
- LENGTH isn’t mentioned, it remains the intercept value 0;
coefficient 3 is how we get from the intercept to the prediction for when
- LANGUAGE isn’t mentioned, it remains the intercept value spanish, but
- LENGTH changes from 0 to 1.

But now we do not just have additive effects, which means the predictions for LANGUAGE: english and LENGTH: 1 are different in how they require an additional addition, the interaction term: coefficient 4 is what you add to the intercept, the main effect of LANGUAGE: spanish→english, and the main effect of LENGTH: 0→1 (see cases 178 and 96). The coefficients are therefore used like above:

"*"( # prediction for case 178 (LANGUAGE: english, LENGTH: 3):
   coef(m_01),
   model.matrix(m_01)[178,]) %>% sum

[1] 9.091082

"*"( # prediction for case 96 (LANGUAGE: english, LENGTH: 4):
   coef(m_01),
   model.matrix(m_01)[96,]) %>% sum

[1] 9.125983

Thus, this is what this model predicts and we again add all predictions to the data frame d:

d$PRED <- predict(m_01)
d[picky,]

    CASE    RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE   RT FREQPMW CORRECT
35    36  9.228819 heritage    b      3 4.000000  spanish  600      10 correct
4      4 10.306062 heritage    b      4 5.123852  spanish 1266     133 correct
178  192  9.233620 heritage    b      3 4.653213  english  602      45 correct
96   105  8.882643 heritage    a      4 5.252853  english  472     179 correct
        PRED
35  9.340243
4   9.375144
178 9.091082
96  9.125983

Let’s compute emmeans and effects predictions again. What does emmeans return here for LANGUAGE, when we consider that LENGTH is also in the model?

emmeans(m_02, specs= ~ LANGUAGE | LENGTH) %>% data.frame

  LANGUAGE   LENGTH   emmean          SE   df lower.CL upper.CL
1  spanish 5.200077 9.415459 0.007150550 7748 9.401442 9.429476
2  english 5.200077 9.164490 0.007130065 7748 9.150514 9.178467

These are predictions for each level of LANGUAGE while LENGTH is held constant at the mean:

For LENGTH, we would want to know the slopes of LENGTH for each level of LANGUAGE so we use emtrends again:

emtrends(m_02, var="LENGTH", specs= ~ LANGUAGE) %>% data.frame

  LANGUAGE LENGTH.trend          SE   df    lower.CL   upper.CL
1  spanish   0.04108912 0.005726456 7748 0.029863715 0.05231452
2  english   0.02086630 0.008623755 7748 0.003961409 0.03777119

# emtrends(m_02, var="LENGTH") %>% data.frame # would also work

Now that we have the interaction in the model, we do get a different slope of LENGTH for each level of LANGUAGE.

What does effects return? Predictions that are perfectly compatible (check rows 9-10 of the output of the first bit of code and rows 5 and 12 of the output of the second bit of code):

(lale_d <- data.frame(            # make lale_d a data frame of
   lale <- effect(                # the effect (or predictorEffect)
      "LANGUAGE:LENGTH",          # of LANGUAGE:LENGTH
      m_02,                       # in m_02 for the following values
      xlevels=list(LENGTH=relevants_l))))

   LANGUAGE   LENGTH      fit          se    lower    upper
1   spanish 0.000000 9.201792 0.032004567 9.139055 9.264530
2   english 0.000000 9.055984 0.043310253 8.971084 9.140884
3   spanish 1.000000 9.242882 0.026446520 9.191039 9.294724
4   english 1.000000 9.076850 0.034820641 9.008592 9.145108
5   spanish 3.000000 9.325060 0.015697909 9.294288 9.355832
6   english 3.000000 9.118583 0.018223335 9.082860 9.154306
7   spanish 5.000000 9.407238 0.007467839 9.392599 9.421877
8   english 5.000000 9.160316 0.006830522 9.146926 9.173705
9   spanish 5.200077 9.415459 0.007150550 9.401442 9.429476
10  english 5.200077 9.164490 0.007130065 9.150514 9.178467
11  spanish 7.000000 9.489416 0.011289432 9.467286 9.511547
12  english 7.000000 9.202048 0.018872607 9.165053 9.239044
13  spanish 9.000000 9.571595 0.021481906 9.529484 9.613705
14  english 9.000000 9.243781 0.035505602 9.174180 9.313381

# (lale_d <- data.frame(lale <- predictorEffect(
#    "LENGTH", m_02,focal.levels=relevants_l)))

1.6.4 Visual interpretation

Here’s an effects plot of the interaction:

plot(lale,                     # plot the effect of LANGUAGE:LENGTH in m_02
   xlab="Length",              # w/ this x-axis label
   ylab="RT in ms (logged)",   # w/ this y-axis label
   ylim=range(d$RT_log, na.rm=TRUE), # w/ these y-axis limits
   multiline=TRUE,             # all data into one panel
   confint=list(style="auto"), # but still confidence intervals
   grid=TRUE)                  # & w/ a grid

Figure 33: The effect of LANGUAGE:LENGTH in m_02

This is how I would plot this:

Figure 34: The effect of LANGUAGE:LENGTH in m_02

1.6.5 Write-up

[I will now discuss m_02 because it has the interaction – in a real study you would discuss m_01 because the interaction was ns!]

To determine whether the reaction time to a word (in ms) varies as a function of the language in which the stimulus word was presented and the length of the word, a linear model was fit with RT_log as the response variable and LANGUAGE as well as LENGTH and their interaction as predictors. The model was highly significant (F=266.7, df₁=3, df₂=7748, p<0.0001) but explains only little of the response variable (mult. R²=0.094, adj. R²=0.0.093). The coefficients and predictions indicate that the (positive) slope of LENGTH for Spanish is significantly higher than the (still positive) one for English words. [Show a plot]

	Estimate	95%-CI	se	t	p_2-tailed
Intercept (LANG=span, LENGTH=0)	9.202	[9.139, 9.265]	0.032	287.52	<0.001
LANG_span→eng	-0.146	[-0.251, -0.04]	0.054	-2.71	<0.007
LENGTH_0→1	0.041	[-0.03, 0.052]	0.006	7.18	<0.001
LANG_span→eng, LENGTH_0→1	-0.02	[-0.041, 0]	0.011	-1.95	0.051

Quiz question: We ‘know’ that the slope of LENGTH for LANGUAGE: english is significantly less than the slope of LENGTH for LANGUAGE: spanish (from the summary output). But is the slope of LENGTH for LANGUAGE: english significantly different from 0? How do we find that out? # replace this with your answers/code

1.7 Two numeric predictors

Last one: we are asking, does the logged reaction time to a word (in ms) vary as a function of

the (Zipf) frequency of the word;
the length of the word?

We’ve looked at the predictors before so we’ll keep this short here:

par(mfrow=c(1, 2)) # make the plotting window have 1 row & 2 columns
plot(pch=16, col="#00000020",
   x=d$ZIPFFREQ,
   y=d$RT_log)
abline(v=quantiles(d$ZIPFFREQ), lty=2, col="#0000FF40")
abline(h=quantiles(d$RT_log)  , lty=2, col="#0000FF40")
plot(pch=16, col="#00000020",
   x=jitter(d$LENGTH),
   y=d$RT_log)
abline(v=quantiles(d$LENGTH), lty=2, col="#0000FF40")
abline(h=quantiles(d$RT_log), lty=2, col="#0000FF40")
par(mfrow=c(1, 1)) # reset to default

Figure 35: RT (logged), LENGTH, and ZIPFFREQ

1.7.1 Modeling & numerical interpretation

Let’s fit a model:

summary(m_01 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   ZIPFFREQ + LENGTH,     # & the predictors
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + ZIPFFREQ + LENGTH, data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.24899 -0.30279 -0.07074  0.22393  2.70915 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.496032   0.051630  183.93   <2e-16 ***
ZIPFFREQ    -0.109251   0.009231  -11.84   <2e-16 ***
LENGTH       0.057068   0.004799   11.89   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.443 on 7749 degrees of freedom
Multiple R-squared:  0.03869,   Adjusted R-squared:  0.03845 
F-statistic:   156 on 2 and 7749 DF,  p-value: < 2.2e-16

Confint(m_01)

               Estimate       2.5 %      97.5 %
(Intercept)  9.49603232  9.39482453  9.59724010
ZIPFFREQ    -0.10925086 -0.12734680 -0.09115492
LENGTH       0.05706792  0.04766146  0.06647439

As before: are all 2 predictors significant?

drop1(m_01, test="F") %>% data.frame # the short version of

         Df Sum.of.Sq      RSS       AIC  F.value       Pr..F.
<none>   NA        NA 1520.741 -12620.10       NA           NA
ZIPFFREQ  1  27.48711 1548.228 -12483.23 140.0617 4.866190e-32
LENGTH    1  27.75700 1548.498 -12481.88 141.4370 2.465044e-32

# anova(m_01, update(m_01, .~. -ZIPFFREQ), test="F") %>% data.frame
# anova(m_01, update(m_01, .~. -LENGTH)  , test="F") %>% data.frame

So, what do the coefficients mean, for which we return to the data and the model matrix:

model.matrix(m_01) %>% head

  (Intercept) ZIPFFREQ LENGTH
1           1 4.278754      5
2           1 4.892095      5
3           1 5.367356      6
4           1 5.123852      4
5           1 4.146128      6
6           1 4.826075      6

The model matrix shows that we basically now have a combination of what we saw for each categorical predictor above and we can see that

the intercept is the prediction for when both ZIPFFREQ is 0 and LENGTH is 0 (because then the multiplications would result in zeros to be added to the intercept);
coefficient 2 is how we get from the intercept to the prediction for when
- ZIPFFREQ changes from 0 to 1, but
- LENGTH isn’t mentioned, it remains the intercept value 0;
coefficient 3 is how we get from the intercept to the prediction for when
- ZIPFFREQ isn’t mentioned, it remains the intercept value 0, but
- LENGTH changes from 0 to 1.

The coefficients are therefore used like above: For the predictions, we multiply each coefficient with each row in the model matrix:

apply( # apply to the head of the model matrix:
   head(model.matrix(m_01)),
   1,  # row by row the multiplication & summation:
   \(af) sum(af*coef(m_01)))

       1        2        3        4        5        6 
9.313914 9.246906 9.252052 9.164519 9.385472 9.311187

d$PRED <- predict(m_01)
head(d)

  CASE    RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE   RT FREQPMW CORRECT
1    1  9.546894 heritage    b      5 4.278754  spanish  748      19 correct
2    2  8.672425 heritage    b      5 4.892095  spanish  408      78 correct
3    3  8.915879 heritage    b      6 5.367356  spanish  483     233 correct
4    4 10.306062 heritage    b      4 5.123852  spanish 1266     133 correct
5    5  9.312883 heritage    b      6 4.146128  spanish  636      14 correct
6    6  8.625709 heritage    b      6 4.826075  spanish  395      67 correct
      PRED
1 9.313914
2 9.246906
3 9.252052
4 9.164519
5 9.385472
6 9.311187

We see the model is not great from the R-squared, which I am not computing again from the deviance.

Let’s compute emmeans and effects predictions again. What does emmeans return? The slope of each predictor while the other is at its mean:

emtrends(m_01, var="ZIPFFREQ", specs= ~ LENGTH) %>% data.frame

    LENGTH ZIPFFREQ.trend          SE   df   lower.CL    upper.CL
1 5.200077     -0.1092509 0.009231348 7749 -0.1273468 -0.09115492

emtrends(m_01, var="LENGTH", specs= ~ ZIPFFREQ) %>% data.frame

  ZIPFFREQ LENGTH.trend          SE   df   lower.CL   upper.CL
1 4.609476   0.05706792 0.004798556 7749 0.04766146 0.06647439

What does effects return? Predictions that are perfectly compatible (check rows 1-2 of both outputs):

(zip_d <- data.frame( # make zip_d a data frame of
   zip <- effect(     # the effect
      "ZIPFFREQ",     # of ZIPFFREQ
      m_01,           # in m_01 for the following values
      xlevels=list(ZIPFFREQ=relevants_z))))

  ZIPFFREQ      fit          se    lower    upper
1 0.000000 9.792790 0.042848122 9.708796 9.876784
2 1.000000 9.683539 0.033698080 9.617482 9.749596
3 3.000000 9.465037 0.015686472 9.434288 9.495787
4 4.000000 9.355786 0.007547924 9.340991 9.370582
5 4.609476 9.289201 0.005031505 9.279338 9.299064
6 5.000000 9.246536 0.006189709 9.234402 9.258669
7 6.000000 9.137285 0.013787292 9.110258 9.164312

(len_d <- data.frame( # make len_d a data frame of
   len <- effect(     # the effect
      "LENGTH",       # of LENGTH
      m_01,           # in m_01 for the following values
      xlevels=list(LENGTH=relevants_l))))

    LENGTH      fit          se    lower    upper
1 0.000000 8.992443 0.025455085 8.942544 9.042342
2 1.000000 9.049511 0.020772869 9.008791 9.090231
3 3.000000 9.163647 0.011694887 9.140722 9.186572
4 5.000000 9.277783 0.005122285 9.267742 9.287824
5 5.200077 9.289201 0.005031505 9.279338 9.299064
6 7.000000 9.391919 0.009995715 9.372324 9.411513
7 9.000000 9.506054 0.018915600 9.468975 9.543134

1.7.2 Visual interpretation

Let’s visualize each predictor’s effect:

plot(zip,                    # plot the effect of ZIPFFREQ
   xlab="Zipf frequency",    # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

Figure 36: The effect of ZIPFFREQ in m_01

plot(len,                    # plot the effect of LENGTH
   xlab="Length",            # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

1.7.3 Modeling with interactions

As before, the critical thing to realize here is that the effect of LENGTH is the same everywhere, i.e. regardless of what ZIPFFREQ is, and of course the other way round. But it might again not be appropriate or accurate: Maybe the effect of LENGTH is not the same regardless of the value of ZIPFFREQ (and vice versa). If that’s the case, this is again an interaction: something doesn’t do the same everywhere/always.

How do we check for this in R? By now we know, same way as always:

summary(m_02 <- lm(       # make/summarize the linear model m_01:
   RT_log ~ 1 +           # RT_log ~ an overall intercept (1)
   ZIPFFREQ + LENGTH +    # & the predictors
   ZIPFFREQ:LENGTH,       # & their interaction <----------------------------------------------
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
lm(formula = RT_log ~ 1 + ZIPFFREQ + LENGTH + ZIPFFREQ:LENGTH, 
    data = d, na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.24632 -0.30144 -0.07187  0.22311  2.71514 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      9.120698   0.199441  45.731  < 2e-16 ***
ZIPFFREQ        -0.026128   0.043651  -0.599 0.549484    
LENGTH           0.129044   0.037253   3.464 0.000535 ***
ZIPFFREQ:LENGTH -0.015979   0.008201  -1.948 0.051413 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4429 on 7748 degrees of freedom
Multiple R-squared:  0.03916,   Adjusted R-squared:  0.03879 
F-statistic: 105.3 on 3 and 7748 DF,  p-value: < 2.2e-16

Is the interaction significant? It is an interaction of two predictors with 1 df so we already see that it’s not.

drop1(m_02, test="F") %>% data.frame # the short version of

                Df Sum.of.Sq      RSS       AIC  F.value     Pr..F.
<none>          NA        NA 1519.996 -12621.89       NA         NA
ZIPFFREQ:LENGTH  1  0.744684 1520.741 -12620.10 3.795938 0.05141312

anova(m_01, m_02, test="F") %>% data.frame

  Res.Df      RSS Df Sum.of.Sq        F     Pr..F.
1   7749 1520.741 NA        NA       NA         NA
2   7748 1519.996  1  0.744684 3.795938 0.05141312

Just about not so m_01 is actually the model to go with. For didactic reasons, we will again look at m_02 though. So, what do the coefficients mean, for which we return to the data and the model matrix:

model.matrix(m_02) %>% head

  (Intercept) ZIPFFREQ LENGTH ZIPFFREQ:LENGTH
1           1 4.278754      5        21.39377
2           1 4.892095      5        24.46047
3           1 5.367356      6        32.20414
4           1 5.123852      4        20.49541
5           1 4.146128      6        24.87677
6           1 4.826075      6        28.95645

The model matrix shows that we basically now have a combination of what we saw for each categorical predictor above and we can see that

the intercept is the prediction for when both ZIPFFREQ is 0 and LENGTH is 0 (because then the multiplications would result in zeros to be added to the intercept);
coefficient 2 is how we get from the intercept to the prediction for when
- ZIPFFREQ changes from 0 to 1, but
- LENGTH isn’t mentioned, it remains the intercept value 0;
coefficient 3 is how we get from the intercept to the prediction for when
- ZIPFFREQ isn’t mentioned, it remains the intercept value 0, but
- LENGTH changes from 0 to 1.

But now we do not just have additive effects, which means the prediction for ZIPFFREQ: 1 and LENGTH: 1 is different in how it requires an additional addition, the interaction term: coefficient 4 is what you add to the intercept, the main effect of ZIPFFREQ: 0→1, and the main effect of LENGTH: 0→1.

apply( # apply to the head of the model matrix:
   head(model.matrix(m_02)),
   1,  # row by row the multiplication & summation:
   \(af) sum(af*coef(m_02)))

       1        2        3        4        5        6 
9.312283 9.247256 9.240150 9.175513 9.389139 9.306186

d$PRED <- predict(m_02)
head(d)

  CASE    RT_log    GROUP SITE LENGTH ZIPFFREQ LANGUAGE   RT FREQPMW CORRECT
1    1  9.546894 heritage    b      5 4.278754  spanish  748      19 correct
2    2  8.672425 heritage    b      5 4.892095  spanish  408      78 correct
3    3  8.915879 heritage    b      6 5.367356  spanish  483     233 correct
4    4 10.306062 heritage    b      4 5.123852  spanish 1266     133 correct
5    5  9.312883 heritage    b      6 4.146128  spanish  636      14 correct
6    6  8.625709 heritage    b      6 4.826075  spanish  395      67 correct
      PRED
1 9.312283
2 9.247256
3 9.240150
4 9.175513
5 9.389139
6 9.306186

We see the model is not great from the R-squared, which I am not computing again from the deviance.

Let’s compute emmeans and effects predictions again. What does emmeans return here for each predictor when we consider that the corresponding other one is also in the model? The slope of each predictor while the other is at its mean:

emtrends(m_02, var="ZIPFFREQ", specs= ~ LENGTH) %>% data.frame

    LENGTH ZIPFFREQ.trend          SE   df   lower.CL    upper.CL
1 5.200077     -0.1092176 0.009229699 7748 -0.1273103 -0.09112488

emtrends(m_02, var="LENGTH", specs= ~ ZIPFFREQ) %>% data.frame

  ZIPFFREQ LENGTH.trend          SE   df   lower.CL   upper.CL
1 4.609476   0.05539128 0.004874258 7748 0.04583642 0.06494614

What does effects return? Predictions that are perfectly compatible (check rows 29 and 30 for ZIPFFREQ as well as 5 and 12 for LENGTH of this output):

(zile_d <- data.frame(   # make zile_d a data frame of
   zile <- effect(       # the effect
      "ZIPFFREQ:LENGTH", # of ZIPFFREQ:LENGTH
      m_02,              # in m_02
      xlevels=list(      # for these values of the predictors
         ZIPFFREQ=relevants_z,
         LENGTH=relevants_l))))

   ZIPFFREQ   LENGTH       fit          se    lower     upper
1  0.000000 0.000000  9.120698 0.199441378 8.729739  9.511657
2  1.000000 0.000000  9.094570 0.156249733 8.788279  9.400862
3  3.000000 0.000000  9.042315 0.071558871 8.902040  9.182590
4  4.000000 0.000000  9.016187 0.034499982 8.948558  9.083817
5  4.609476 0.000000  9.000263 0.025765045 8.949757  9.050770
6  5.000000 0.000000  8.990060 0.032720725 8.925918  9.054201
7  6.000000 0.000000  8.963932 0.069006254 8.828661  9.099203
8  0.000000 1.000000  9.249742 0.163253667 8.929721  9.569764
9  1.000000 1.000000  9.207636 0.127949737 8.956820  9.458452
10 3.000000 1.000000  9.123423 0.058712317 9.008331  9.238515
11 4.000000 1.000000  9.081317 0.028350693 9.025742  9.136892
12 4.609476 1.000000  9.055654 0.021007114 9.014475  9.096834
13 5.000000 1.000000  9.039211 0.026573248 8.987120  9.091302
14 6.000000 1.000000  8.997105 0.056162828 8.887010  9.107199
15 0.000000 3.000000  9.507831 0.093425243 9.324692  9.690969
16 1.000000 3.000000  9.433767 0.073348608 9.289984  9.577550
17 3.000000 3.000000  9.285640 0.033936620 9.219115  9.352165
18 4.000000 3.000000  9.211577 0.016481134 9.179269  9.243884
19 4.609476 3.000000  9.166437 0.011780144 9.143345  9.189529
20 5.000000 3.000000  9.137513 0.014638709 9.108818  9.166209
21 6.000000 3.000000  9.063450 0.031299940 9.002094  9.124806
22 0.000000 5.000000  9.765919 0.043670235 9.680314  9.851525
23 1.000000 5.000000  9.659898 0.034375324 9.592514  9.727283
24 3.000000 5.000000  9.447857 0.016075601 9.416345  9.479370
25 4.000000 5.000000  9.341837 0.007785475 9.326575  9.357098
26 4.609476 5.000000  9.277219 0.005129513 9.267164  9.287275
27 5.000000 5.000000  9.235816 0.006218844 9.223625  9.248007
28 6.000000 5.000000  9.129795 0.013878263 9.102590  9.157000
29 0.000000 5.200077  9.791738 0.042843797 9.707752  9.875723
30 1.000000 5.200077  9.682520 0.033696060 9.616467  9.748574
31 3.000000 5.200077  9.464085 0.015691256 9.433326  9.494844
32 4.000000 5.200077  9.354868 0.007561289 9.340045  9.369690
33 4.609476 5.200077  9.288302 0.005051700 9.278399  9.298205
34 5.000000 5.200077  9.245650 0.006205266 9.233486  9.257814
35 6.000000 5.200077  9.136432 0.013791746 9.109397  9.163468
36 0.000000 7.000000 10.024007 0.078665115 9.869803 10.178212
37 1.000000 7.000000  9.886030 0.061449727 9.765572 10.006488
38 3.000000 7.000000  9.610074 0.027700695 9.555773  9.664375
39 4.000000 7.000000  9.472096 0.013065961 9.446483  9.497709
40 4.609476 7.000000  9.388002 0.010194076 9.368019  9.407985
41 5.000000 7.000000  9.334118 0.013389705 9.307871  9.360366
42 6.000000 7.000000  9.196141 0.028160666 9.140938  9.251343
43 0.000000 9.000000 10.282096 0.146872763 9.994185 10.570006
44 1.000000 9.000000 10.112161 0.114731149 9.887257 10.337065
45 3.000000 9.000000  9.772291 0.051753307 9.670840  9.873741
46 4.000000 9.000000  9.602356 0.024534566 9.554262  9.650450
47 4.609476 9.000000  9.498785 0.019276764 9.460997  9.536572
48 5.000000 9.000000  9.432421 0.025221465 9.382980  9.481862
49 6.000000 9.000000  9.262486 0.052734590 9.159112  9.365860

1.7.4 Visual interpretation

Let’s visualize each predictor’s effect:

plot(zile,                   # plot the effect of ZIPFFREQ
   xlab="Zipf frequency",    # w/ this x-axis label
   ylab="RT in ms (logged)", # w/ this y-axis label
   ylim=c(8, 12),            # w/ these y-axis limits
   grid=TRUE)                # & w/ a grid

Figure 38: The effect of ZIPFFREQ:LENGTH in m_02

Not great. Two options, but for both we first increase resolution in zile_d:

zile_d <- data.frame(   # make zile_d a data frame of
   zile <- effect(       # the effect
      "ZIPFFREQ:LENGTH", # of ZIPFFREQ:LENGTH
      m_02,              # in m_02
      xlevels=list(      # for these values of the predictors
         ZIPFFREQ=seq(min(d$ZIPFFREQ), max(d$ZIPFFREQ), length.out=15),   # <------------------
         LENGTH  =seq(min(d$LENGTH)  , max(d$LENGTH)  , length.out=15)))) # <------------------

Option 1 is a 3-D plot for websites and your own reports:

library(plotly)
fig <- plot_ly(zile_d,
   x=~ZIPFFREQ, y=~LENGTH, z=~fit, color=~fit, colors=c("#FFFFFF20", "#00000020")) %>% 
   add_markers        %>% 
   layout(scene=list(
      xaxis=list(title = "Zipf frequency"),
      yaxis=list(title = "Length"),
      zaxis=list(title = "Predicted RT")))
fig

Figure 39: The effect of ZIPFFREQ:LENGTH in m_02

detach(package:plotly)

Option 2 is a numeric heatmap for publications:

zile_d$fitcat <- as.numeric( # create a numeric column fitcat in zile_d
   cut(zile_d$fit,           # by cutting the predicted values up
       breaks=seq(           # at the values ranging from
          min(zile_d$fit),   # the smallest pred RT to   <-------------------------------------
          max(zile_d$fit),   # the largest pred RT       <-------------------------------------
          length.out=10),    # into 9 groups
       include.lowest=TRUE,  # include the lowest, and
       labels=1:9)) # use the numbers 1:9 as labels
plot(type="n",                                       # plot nothing, just
   xlab="Zipf frequency", x=zile_d$ZIPFFREQ,         # get the coordinate
   ylab="LENGTH"        , y=zile_d$LENGTH, ); grid() # system set up
# add the predictions
text(x=zile_d$ZIPFFREQ, y=zile_d$LENGTH, # plot text at these coordinates, namely
   labels=zile_d$fitcat,           # the 'categorical/ordinal' predictions
   # make the physical size of the numbers dependent on the values
   cex=0.5+zile_d$fitcat/7,
   # make the darkness of the grey reflect the predicted RT
   col=grey(level=zile_d$fitcat/10)) # see ?grey

Figure 40: The effect of ZIPFFREQ:LENGTH in m_02

But why is this specific one potentially misleading? This is because this procedure can exaggerate the size of the effects that the two predictors have, namely when the predicted RTs in zile_d$fit cover only a small range of the range of the actually observed RTs. Then our binning of the predictions into 9 bins make the lowest and the highest predicted bin – 1 and 9 – seem widely apart when in fact they are fairly close to each other when considered against the background of the range of the actually observed RTs. To avoid something like that, we above always defined y-axis limits such that they cover the whole range of reaction times. The equivalent to that here would be to take care that we anchor the range of bin values by (i) the smallest RT value from both the observed and the predicted data and (ii) the largest RT value from both the observed and the predicted data. That will make maximally sure the plot is not misleading. Here’s how can do this:

# define the minimum of the range to be used for binning
min_of_obs_n_pred <- min(c(
   d$RT_log,
   zile_d$fit))
# define the maximum of the range to be used for binning
max_of_obs_n_pred <- max(c(
   d$RT_log,
   zile_d$fit))

And then we use cut to bin like before, just with the new breaks setting:

zile_d$fitcat <- as.numeric( # create a numeric column fitcat in zile_d
   cut(zile_d$fit,           # by cutting the predicted values up
       breaks=seq(           # at the values ranging from
          min_of_obs_n_pred, # the smallest obs/pred RT to   <---------------------------------
          max_of_obs_n_pred, # the largest obs/pred RT       <---------------------------------
          length.out=10),    # into 9 groups
       include.lowest=TRUE,  # include the lowest, and
       labels=1:9)) # use the numbers 1:9 as labels
plot(type="n",                                       # plot nothing, just
   xlab="Zipf frequency", x=zile_d$ZIPFFREQ,         # get the coordinate
   ylab="Length"        , y=zile_d$LENGTH, ); grid() # system set up
# add the predictions
text(x=zile_d$ZIPFFREQ, y=zile_d$LENGTH, # plot text at these coordinates, namely
   labels=zile_d$fitcat,           # the 'categorical/ordinal' predictions
   # make the physical size of the numbers dependent on the values
   cex=0.5+zile_d$fitcat/7,
   # make the darkness of the grey reflect the predicted RT
   col=grey(level=zile_d$fitcat/10)) # see ?grey

Figure 41: The effect of ZIPFFREQ:LENGTH in m_02

Much more realistic and much more sobering…

1.7.5 Write-up

[I will now discuss m_02 because it has the interaction – in a real study you would discuss m_01 because the interaction was ns!] To determine whether the reaction time to a word (in ms) varies as a function of the language in which the stimulus word was presented and the length of the word, a linear model was fit with RT_log as the response variable and LANGUAGE as well as LENGTH and their interaction as predictors. The model was highly significant (F=266.7, df₁=3, df₂=7748, p<0.0001) but explains only little of the response variable (mult. R²=0.094, adj. R²=0.0.093). The coefficients and predictions indicate that LENGTH is positively correlated with (logged) RT whereas ZIPFFREQ is negatively correlated with (logged) RT. However, the two variables interact weakly such that

when ZIPFFREQ is low, LENGTH has a stronger effect than when ZIPFFREQ is high;
when LENGTH is small, ZIPFFREQ has a weaker effect than when LENGTH is high. [Show a plot]

	Estimate	95%-CI	se	t	p_2-tailed
Intercept (ZIPF=0, LENGTH=0)	9.121	[8.73, 9.517]	0.199	47.73	<0.001
ZIPF_0→1	-0.026	[-0.112, 0.059]	0.044	-0.599	0.55
LENGTH_0→1	0.129	[0.056, 0.202]	0.037	3.464	<0.001
ZIPF_0→1, LENGTH_0→1	-0.016	[-0.032, 0]	0.008	-1.95	0.051

2 Binary logistic regression modeling

2.1 Introduction

rm(list=ls(all=TRUE)); invisible(gc())
set.seed(sum(utf8ToInt("Gummibärchen")))
invisible(sapply(c("car", "effects", "emmeans", "magrittr", "multcomp"),
   library, character.only=TRUE, logical.return=TRUE,
   quietly=TRUE, verbose=FALSE, warn.conflicts=TRUE))
invisible(sapply(dir("_helpers", full.names=TRUE), source))

Imagine you have a binary variable such as the choice of a genitive construction; the data are in _input/genitives.csv and you can find information about the variables/columns in _input/genitives.r, and our response variable is called GENITIVE:

summary(d <- read.delim(        # summarize d, the result of loading
   file="_input/genitives.csv", # this file
   stringsAsFactors=TRUE))      # change categorical variables into factors

      CASE      GENITIVE  SPEAKER       MODALITY      POR_LENGTH    
 Min.   :   2   of:2720   nns:2666   spoken :1685   Min.   :  1.00  
 1st Qu.:1006   s : 880   ns : 934   written:1915   1st Qu.:  8.00  
 Median :2018                                       Median : 11.00  
 Mean   :2012                                       Mean   : 14.58  
 3rd Qu.:3017                                       3rd Qu.: 17.00  
 Max.   :4040                                       Max.   :204.00  
   PUM_LENGTH         POR_ANIMACY   POR_FINAL_SIB        POR_DEF    
 Min.   :  2.00   animate   : 920   absent :2721   definite  :2349  
 1st Qu.:  6.00   collective: 607   present: 879   indefinite:1251  
 Median :  9.00   inanimate :1671                                   
 Mean   : 10.35   locative  : 243                                   
 3rd Qu.: 13.00   temporal  : 159                                   
 Max.   :109.00

There are 3600 genitives in the data. Your job is to guess each one of them such that you guess correctly as many as possible. What would you do?

possible_picks <- list(
   sample(c("of", "s"), 3600, replace=TRUE),
   sample(c("of", "s"), 3600, replace=TRUE),
   rep(c("of", "s"), c(2720, 880)),
   sample(c("of", "s"), 3600, prob=table(d$GENITIVE), replace=TRUE))

Let’s see how good each of these is, where we determine goodness on the basis of the number/proportion of correct guesses:

(possible_accuracies <- sapply(
   possible_picks, \(af) mean(af==d$GENITIVE)))

[1] 0.5005556 0.5166667 0.6366667 0.6388889

Here are the corresponding results for my guesses …

stgs_picks <- rep(d$GENITIVE %>% table %>% sort %>% tail(1) %>% names, nrow(d))
(stgs_accuracy <- mean(stgs_picks==d$GENITIVE))

[1] 0.7555556

… which is better than all other results above:

stgs_accuracy > possible_accuracies

[1] TRUE TRUE TRUE TRUE

table(d$GENITIVE) %>% max

[1] 2720

Thus, when I asked you to predict for every single case in the data which genitive would be used, you should have always guessed the most frequent level of the response, the mode, which leads to a baseline called no-information rate (because we are not using any information other than the actual response variable itself). If you always, invariably predict the most frequent level of the response, what is the baseline accuracy you would get? The result we already saw:

# baseline := highest proportion in the table of the response
(baseline <- d$GENITIVE %>% table %>% prop.table %>% max)

[1] 0.7555556

As for deviance, in the case of lm, we defined deviance as the sum of squared residuals and we said we want to reduce the deviance of our model relative to a null model by adding (hopefully good) predictors. What is deviance here in the case of a glm? To understand that, we first compute a null model:

summary(m_00 <- glm(      # make/summarize the gen. linear model m_00:
   GENITIVE ~ 1,          # GENITIVE ~ an overall intercept (1) & no predictor
   family=binomial,       # the response is binary
   data=d,                # those variables are in d
   na.action=na.exclude)) # skip cases with NA/missing data (good habit)


Call:
glm(formula = GENITIVE ~ 1, family = binomial, data = d, na.action = na.exclude)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.12847    0.03878   -29.1   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 4004.3  on 3599  degrees of freedom
AIC: 4006.3

Number of Fisher Scoring iterations: 4

In a way that will become clearer further below, this model allows us to compute the no-information rate / baseline from above …

mean(ifelse(predict(m_00)>0, "s", "of")==d$GENITIVE)

[1] 0.7555556

… but also the deviance:

deviance(m_00)

[1] 4004.273

sum(residuals(m_00)^2)

[1] 4004.273

We see that the deviance of m_00 is still the sum of this model’s squared residuals and the model’s “null deviance” (and, if a model is a null model, also what is called “residual deviance”). But what are squared residuals here? We might discuss that in more detail later, if we have time. As in the book, we will first discuss some didactically motivated monofactorial models before we go over a ‘proper’ multifactorial modeling approach.

2.2 A binary predictor

Does the choice of a genitive construction vary as a function of the modality in which the construction was produced (spoken vs. written)?

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
table(d$GENITIVE)


  of    s 
2720  880

table(d$MODALITY)


 spoken written 
   1685    1915

# the predictor(s) w/ the response
(mod_x_gen <- table(d$MODALITY, d$GENITIVE))

         
            of    s
  spoken  1232  453
  written 1488  427

How could one approach this ‘descriptively’? We could say what the odds are for s-genitives in the spoken data:

453 / 1232 # odds for s-genitives in the spoken data

[1] 0.3676948

… and we could say what the odds are for s-genitives in the written data:

427 / 1488 # odds for s-genitives in the written data

[1] 0.2869624

But in some sense, odds are annoying because they have this very different value range for ‘I like s-genitives’ vs. ‘I don’t like s-genitives’, here shown for the spoken part:

1675 /   10 # I like s-genitives a lot
## [1] 167.5
1000 /  685 # I like s-genitives somewhat
## [1] 1.459854
 843 /  842 # I like s-genitives pretty much as much as of-genitives
## [1] 1.001188
 685 / 1000 # I dislike s-genitives somewhat
## [1] 0.685
  10 / 1675 # I don't like s-genitives at all
## [1] 0.005970149

But look what happens if we log the odds:

log(1675 /   10) # I like s-genitives a lot
## [1] 5.120983
log(1000 /  685) # I like s-genitives somewhat
## [1] 0.3783364
log( 843 /  842) # I like s-genitives pretty much as much as of-genitives
## [1] 0.001186944
log( 685 / 1000) # I dislike s-genitives somewhat
## [1] -0.3783364
log(  10 / 1675) # I don't like s-genitives at all
## [1] -5.120983

Figure 43: (Logged) odds vs. frequencies

Much better, so let’s do that for our real data:

mod_x_gen # reminder what the real data were
##          
##             of    s
##   spoken  1232  453
##   written 1488  427
log(453 / 1232) # LOGGED odds for s-genitives in the spoken data
## [1] -1.000502
log(427 / 1488) # LOGGED odds for s-genitives in the written data
## [1] -1.248404

Keep those numbers in mind. But most people would probably express this in probabilities:

addmargins(mod_x_gen) # reminder what the real data were
##          
##             of    s  Sum
##   spoken  1232  453 1685
##   written 1488  427 1915
##   Sum     2720  880 3600
453 / 1685 # probability of s-genitives in the spoken data
## [1] 0.2688427
427 / 1915 # probability of s-genitives in the written data
## [1] 0.2229765

We’ll revisit this in a moment …

2.2.1 Modeling & numerical interpretation

How would we have dealt with this without a model? First, we would have computed a chi-squared test like this, where the chi-squared value is computed involving differences of observed and expected values:

chisq.test(mod_x_gen, correct=FALSE)


    Pearson's Chi-squared test

data:  mod_x_gen
X-squared = 10.21, df = 1, p-value = 0.001397

Second, we might have computed the odds ratio and the logged odds ratio for the s-genitives:

(odds.ratio <-
   (mod_x_gen[2,2] / mod_x_gen[2,1]) /
   (mod_x_gen[1,2] / mod_x_gen[1,1]))

[1] 0.7804363

log(odds.ratio)

[1] -0.2479022

How do we do this now? With a binary logistic regression, which returns the G²-statistic, which is computed based on ratios of observed and expected values (not differences like chi-squared) and which also returns the (logged) odds ratio. Let’s fit our regression model,

summary(m_01 <- glm(      # make/summarize the gen. linear model m_01:
   GENITIVE ~ 1 +         # GENITIVE ~ an overall intercept (1)
   MODALITY,              # & the predictor MODALITY
   family=binomial,       # resp = binary
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data


Call:
glm(formula = GENITIVE ~ 1 + MODALITY, family = binomial, data = d, 
    na.action = na.exclude)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -1.00050    0.05495 -18.208  < 2e-16 ***
MODALITYwritten -0.24790    0.07767  -3.192  0.00141 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 3994.1  on 3598  degrees of freedom
AIC: 3998.1

Number of Fisher Scoring iterations: 4

Confint(m_01) # confidence intervals

                  Estimate      2.5 %      97.5 %
(Intercept)     -1.0005020 -1.1091198 -0.89367488
MODALITYwritten -0.2479022 -0.4002591 -0.09571937

Is the model significant? Of course: its only predictor is. But even if there were more than one predictor, we could compute the significance from the deviance reduction like this:

pchisq(              # compute the area under the chi-squared curve
   q=4004.3-3994.1,  # for this chi-squared value: null - residual deviance
   df=3599-3598,     # at this df: null - residual df
   lower.tail=FALSE) # only using the right/upper tail/side

[1] 0.001404407

We can also use drop1 again, just with a different test argument:

drop1(m_01, # drop each droppable predictor at a time from m_01 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

         Df Deviance      AIC      LRT    Pr..Chi.
<none>   NA 3994.079 3998.079       NA          NA
MODALITY  1 4004.273 4006.273 10.19365 0.001409249

(Note how similar the chi-squared value and the G²-statistic and their p-values are.)

But what are the coefficients?

coef(m_01)

    (Intercept) MODALITYwritten 
     -1.0005020      -0.2479022

Recognize the intercept?

coef(m_01)[1]

(Intercept) 
  -1.000502

mod_x_gen # reminder what the real data were

         
            of    s
  spoken  1232  453
  written 1488  427

log(453 / 1232) # LOGGED odds for s-genitives in the spoken data

[1] -1.000502

Recognize what the MODALITY coefficient is?

"+"(                # add to
   log(453 / 1232), # coef(m_01)[1], the intercept
   coef(m_01)[2])   # coef(m_01)[2], the coef for MODALITY

MODALITYwritten 
      -1.248404

log(427 / 1488) # LOGGED odds for s-genitives in the written data

[1] -1.248404

Let’s make ‘predictions’, which we add to the data: first, we compute the predictions of the model. The function predict, used like here, does the same coef times model.matrix thing as always:

d$PRED_LO_s <- predict( # make d$PRED_LO_s the result of predicting
   m_01)                # log odds of s-genitives from m_01
picky <- c(1, 3600)
sum(coef(m_01) * model.matrix(m_01)[1,])

[1] -1.000502

sum(coef(m_01) * model.matrix(m_01)[3600,])

[1] -1.248404

d[c(1,3600),]

     CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
1       2       of     nns   spoken         13          7  collective
3600 4040        s      ns  written          7          6     animate
     POR_FINAL_SIB  POR_DEF PRED_LO_s
1           absent definite -1.000502
3600        absent definite -1.248404

Second, we compute predicted probabilities, note the type="response":

d$PRED_PP_s <- predict( # make d$PRED_PP_s the result of predicting
   m_01,                # from m_01
   type="response")     # predicted probabilities of "s"
d[c(1,3600),]

     CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
1       2       of     nns   spoken         13          7  collective
3600 4040        s      ns  written          7          6     animate
     POR_FINAL_SIB  POR_DEF PRED_LO_s PRED_PP_s
1           absent definite -1.000502 0.2688427
3600        absent definite -1.248404 0.2229765

Would we be able to get the predicted probabilities from the (logged) odds?

"/"(                              # divide
   exp(predict(m_01)[picky]),     #    the odds by
   (1+exp(predict(m_01)[picky]))) # 1+ the odds

        1      3600 
0.2688427 0.2229765

plogis(predict(m_01)[picky]) # the short version

        1      3600 
0.2688427 0.2229765

Then, we compute the categorical predictions: if the predicted probability of s-genitives is greater than that of of-genitives, we predict s-genitives – how?

d$PRED_CAT <- factor(         # make d$PRED_CAT the result of checking
   ifelse(d$PRED_PP_s>=0.5,   # if the predicted prob. of "s" is >=0.5
      levels(d$GENITIVE)[2],  # if yes, predict "s"
      levels(d$GENITIVE)[1]), # otherwise, predict "of"
   levels=levels(d$GENITIVE)) # make sure both levels are registered!

Let’s now evaluate the predictive capacity of the model. How well is the model doing?

(c_m <- table(          # confusion matrix: cross-tabulate
   "OBS"  =d$GENITIVE,  # observed genitives in the rows
   "PREDS"=d$PRED_CAT)) # predicted orders in the columns

    PREDS
OBS    of    s
  of 2720    0
  s   880    0

Obviously, very badly. We can now compute a variety of statistics although a confusion matrix with only one prediction is not the greatest vehicle to explain them – but that’s what we have, so … To define these statistics, it is useful to start with two easy distinctions:

positives vs. negatives: positives are the cases where the model predicts the level it was tasked to predict, i.e. here s, negatives are when it predicts the other level. (This terminology comes from applications where the two levels of the response are massively different qualitatively and in terms of impact and often also in terms of frequency. For example, when you do cancer screening, the test is tasked with finding cancer so that would be the ‘positive’ prediction; when credit card companies monitor transactions for being likely fraudulent, then the algorithm saying ‘this use of your credit card is fraudulent’ is the ‘positive’ prediction – one way of putting it is that the positives are the predictions that are important and require (a re)action.)
true vs. false is just whether the prediction of the model is correct or not.

That means, we can define the four possible outcomes like shown here in Table 1:

Table 1: The confusion matrix (for when one predicts the 2nd level of GENITIVE)

	predicted: `GENITIVE`: of	predicted: `GENITIVE`: s
observed: `GENITIVE`: of	2720 (true negatives, tn)	0 (false positives, fp)
observed: `GENITIVE`: s	880 (false negatives, fn)	0 (true positives, tp)

With these notions in place, we can now define some widely used metrics: First the simplest one, accuracy. Accuracy answers the following question: “Out of all the predictions the model made, how many (as a proportion) are correct?”. That means it is the sum of the main diagonal of c_m out of all predictions (the sum of c_m); this value is probably always reported although it’s far from ideal.

\[accuracy = \frac{tp+tn}{tp+tn+fp+fn}=\frac{0+2720}{0+2720+0+880}=0.7555556\]

A second useful metric is precision. Precision (also sometimes called positive predictive value) answers the following question: “Out of all the times the model predicted what it was supposed to predict (here, s), how often (as a proportion) was it right?”. Accordingly, it is computed as follows:

\[precision = \frac{tp}{tp+fp}=\frac{0}{0+0}=\mathrm{not~defined,~set~to~0}\]

The next one is recall. Recall (also sometimes called sensitivity or the true positive rate/TPR) answers the following question: “Out of all the instances the model was supposed to predict, how many (as a proportion) did it predict correctly?”. Therefore, this is how we get it:

\[recall/sensitivity/TPR = \frac{tp}{tp+fn}=\frac{0}{0+880}=0\]

That also means that the recall score for each level of the response variable is the accuracy for that level.

The last metric is specificity. Specificity (also sometimes called true negative rate/TNR) answers the question “Out of all the instances the model was not supposed to predict, how many (as a proportion) did it predict correctly?”. I hope you notice that means that specificity is ‘recall for the not predicted level of the response’ so this is how we get that:

\[specificity/TNR = \frac{tn}{tn+fp}=\frac{2720}{2720+0}=1\]

c( # evaluate the confusion matrix
   "Prec. for s"     =c_m[ "s", "s"] / sum(c_m[    , "s"]),
   "Acc./rec. for s" =c_m[ "s", "s"] / sum(c_m[ "s",    ]),
   "Prec. for of"    =c_m["of","of"] / sum(c_m[    ,"of"]),
   "Acc./rec. for of"=c_m["of","of"] / sum(c_m["of",    ]),
   "Acc. (overall)"  =mean(d$GENITIVE==d$PRED_CAT))

     Prec. for s  Acc./rec. for s     Prec. for of Acc./rec. for of 
             NaN        0.0000000        0.7555556        1.0000000 
  Acc. (overall) 
       0.7555556

c(R2s(m_01),
  "Cohen's kappa"=cohens.kappa(c_m)[[1]],
  "C"=C.score(d$GENITIVE, d$PRED_PP_s))

  McFadden R-squared Nagelkerke R-squared        Cohen's kappa 
         0.002545694          0.004212717          0.000000000 
                   C 
         0.530915775

Here, you also see Cohen’s kappa and the C-score. The former is a measure widely used to quantify interrater reliability that, here, helps contextualize an accuracy value against chance. Specifically, Cohen’s κ is computed like this:

\[\mathrm{Cohen's}~\kappa = \frac{accuracy - chance~accuracy}{1-chance~accuracy}\]

The C-score, following Baayen (2008:204) and Hosmer, Lemeshow, & Sturdivant (2013:177), C-scores of >=0.8 are considered good or “excellent” so we’re not really making the cut here, we’re not even getting what Hosmer et al. call “acceptable” (for which C would need to be ≥0.7) … Again, a terrible model in terms of predictive capability.

Side question: what are the R-squareds? McFadden is easy, because it’s pretty much the same thing as R-squared in linear models: the percentage of deviance reduction by a model (sometimes this is called R²_L by Cohen, e.g. in Wikipedia):

"/"(
   deviance(m_00)-deviance(m_01),
   deviance(m_00))

[1] 0.002545694

Note that McFadden’s R² is often surprisingly small, one needs really confident and correct predictions for this coefficient to become what seems ‘nicely high’. Another R², which seems to be much more widely used (Nagelkerke’s R²), usually paints a much rosier picture (see Wikipedia).

2.2.2 Visual interpretation

Let’s visualize the nature of the effect of MODALITY based on the predictions:

mo_d <- data.frame(       # make mo_d a data frame of
   mo <- predictorEffect( # the effect
      "MODALITY",         # of MODALITY
      m_01))              # in m_01
mo_d$fit_lo <- qlogis(mo_d$fit)
mo_d

  MODALITY       fit          se     lower     upper    fit_lo
1   spoken 0.2688427 0.010800758 0.2482073 0.2905308 -1.000502
2  written 0.2229765 0.009511784 0.2048903 0.2421729 -1.248404

plot(mo,            # plot the effect of MODALITY
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

Figure 44: The effect of MODALITY in m_01

2.2.3 Diagnostics 2

Binary logistic regression models also have assumptions that need to be met, but I will only discuss another rule of thumb regarding the amount of data one needs here. One rule of thumb (!) is that the number of data points featuring the less frequent level of the response variable in the model must be greater than the product of the number of values estimated in the model and 15 or, better even, 20.

How many data points do we have?

nrow(model.frame(m_01))

[1] 3600

How many values are estimated in the model? The best way to get that number is this:

m_01 %>% logLik %>% attr("df")

[1] 2

So we may be fine …

">"(
   nrow(model.frame(m_01)),
   20 * (m_01 %>% logLik %>% attr("df")))

[1] TRUE

Note: one should again minimally also check that one has enough data points for all (combinations of) levels of predictors and their interactions!

2.2.4 Write-up

To determine whether the choice of a genitive varies as a function of the modality (spoken vs. written), a generalized linear model was fit with GENITIVE as the response variable and MODALITY as a predictor. The model was highly significant (LR=10.19, df=1, p<0.002) but explains very little of the response variable (Nagelkerke R²=0.004, McFadden’s R²=0.003, C=0.53) and has no predictive power (Cohen’s κ=0): the model only ever predicts of-genitives. The coefficients indicate that s-genitives are more likely in spoken data than in written data, as shown in the summary table shown here. [Show a plot]

	Estimate	95%-CI	se	z	p_2-tailed
Intercept (MODALITY=spoken)	-1	[-1.11, -0.89]	0.05	-18.21	<0.001
MODALITY_{spoken→written}	-0.25	[-0.4, -0.1]	0.08	-3.19	<0.002

Some housekeeping:

str(d <- d[,1:9])

'data.frame':   3600 obs. of  9 variables:
 $ CASE         : int  2 3 4 5 6 7 8 9 10 11 ...
 $ GENITIVE     : Factor w/ 2 levels "of","s": 1 1 1 1 2 2 2 2 2 2 ...
 $ SPEAKER      : Factor w/ 2 levels "nns","ns": 1 1 1 1 1 1 1 1 1 1 ...
 $ MODALITY     : Factor w/ 2 levels "spoken","written": 1 1 1 1 1 1 1 1 1 1 ...
 $ POR_LENGTH   : int  13 22 11 26 8 7 6 5 5 5 ...
 $ PUM_LENGTH   : int  7 7 8 4 4 3 36 3 5 5 ...
 $ POR_ANIMACY  : Factor w/ 5 levels "animate","collective",..: 2 1 1 2 1 1 1 1 1 1 ...
 $ POR_FINAL_SIB: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 1 ...
 $ POR_DEF      : Factor w/ 2 levels "definite","indefinite": 1 1 1 1 1 1 1 1 1 1 ...

2.3 A categorical predictor

We now switch to a categorical predictor and ask, does the choice of a genitive construction vary as a function on the degree/kind of animacy of the possessor (animate vs. collective vs. inanimate vs. locative vs. temporal)? How would we explore/describe this? # replace with your own code/answers

table(d$POR_ANIMACY, d$GENITIVE)

            
               of    s
  animate     370  550
  collective  408  199
  inanimate  1638   33
  locative    199   44
  temporal    105   54

mosaicplot(t(table(d$POR_ANIMACY, d$GENITIVE)), shade=TRUE)

2.3.1 Modeling & numerical interpretation

How would we have dealt without a model? Again a chi-squared test, but now we model this right away:

summary(m_01 <- glm(      # make/summarize the gen. linear model m_01:
   GENITIVE ~ 1 +         # GENITIVE ~ an overall intercept (1)
   POR_ANIMACY,           # & the predictor POR_ANIMACY
   family=binomial,       # resp = binary
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data


Call:
glm(formula = GENITIVE ~ 1 + POR_ANIMACY, family = binomial, 
    data = d, na.action = na.exclude)

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)            0.39642    0.06724   5.896 3.73e-09 ***
POR_ANIMACYcollective -1.11438    0.10953 -10.174  < 2e-16 ***
POR_ANIMACYinanimate  -4.30114    0.18823 -22.851  < 2e-16 ***
POR_ANIMACYlocative   -1.90553    0.17965 -10.607  < 2e-16 ***
POR_ANIMACYtemporal   -1.06139    0.18045  -5.882 4.06e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 2766.0  on 3595  degrees of freedom
AIC: 2776

Number of Fisher Scoring iterations: 6

Confint(m_01) # confidence intervals

                        Estimate      2.5 %     97.5 %
(Intercept)            0.3964153  0.2651511  0.5288144
POR_ANIMACYcollective -1.1143776 -1.3303631 -0.9008787
POR_ANIMACYinanimate  -4.3011390 -4.6879154 -3.9477968
POR_ANIMACYlocative   -1.9055305 -2.2682451 -1.5625579
POR_ANIMACYtemporal   -1.0613916 -1.4207695 -0.7121414

Is the model significant? Of course: the only predictor is. But even if there were more than one predictor, we could compute the significance from the deviance reduction like this:

pchisq(              # compute the area under the chi-squared curve
   q=4004.3-2766,    # for this chi-squared value: null - residual deviance
   df=3599-3595,     # at this df: null - residual df
   lower.tail=FALSE) # only using the right/upper tail/side

[1] 7.926259e-267

We can also use drop1 again, but remember the test argument:

drop1(m_01, # drop each droppable predictor at a time from m_01 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

            Df Deviance      AIC      LRT      Pr..Chi.
<none>      NA 2765.995 2775.995       NA            NA
POR_ANIMACY  4 4004.273 4006.273 1238.278 8.015654e-267

The coefficients have the same function as always:

coef(m_01)

          (Intercept) POR_ANIMACYcollective  POR_ANIMACYinanimate 
            0.3964153            -1.1143776            -4.3011390 
  POR_ANIMACYlocative   POR_ANIMACYtemporal 
           -1.9055305            -1.0613916

Meaning,

the intercept is the prediction when the predictor is its first level (animate);
coefficient 2 is how you get from the intercept (animate) to the prediction when the predictor is its 2nd level (collective), and that difference is significant;
coefficient 3 is how you get from the intercept (animate) to the prediction for when the predictor is its 3rd level (inanimate), and that difference is significant;
coefficient 4 is how you get from the intercept (animate) to the prediction for when the predictor is its 4th level (locative), and that difference is significant;
coefficient 5 is how you get from the intercept (animate) to the prediction for when the predictor is its 5th level (temporal), and that difference is significant:

picky <- c(2,1,17,55,65)
d[picky,c("GENITIVE", "POR_ANIMACY")]

   GENITIVE POR_ANIMACY
2        of     animate
1        of  collective
17       of   inanimate
55       of    locative
65        s    temporal

model.matrix(m_01)[picky,]

   (Intercept) POR_ANIMACYcollective POR_ANIMACYinanimate POR_ANIMACYlocative
2            1                     0                    0                   0
1            1                     1                    0                   0
17           1                     0                    1                   0
55           1                     0                    0                   1
65           1                     0                    0                   0
   POR_ANIMACYtemporal
2                    0
1                    0
17                   0
55                   0
65                   1

apply(model.matrix(m_01)[picky,], 1, \(af) sum(af*coef(m_01)))

         2          1         17         55         65 
 0.3964153 -0.7179623 -3.9047237 -1.5091152 -0.6649763

But it is of course much easier to do that with emmeans again (which gives us predictions on the log odds scale) …

summary(m_01)$coefficients[,1:2]

                        Estimate Std. Error
(Intercept)            0.3964153 0.06723752
POR_ANIMACYcollective -1.1143776 0.10953077
POR_ANIMACYinanimate  -4.3011390 0.18822934
POR_ANIMACYlocative   -1.9055305 0.17964766
POR_ANIMACYtemporal   -1.0613916 0.18045280

data.frame(qwe <- emmeans( # compute estimated marginal means
   object=m_01,            # for the model m_01
   specs= ~ POR_ANIMACY))  # namely all levels of the predictor

  POR_ANIMACY     emmean         SE  df  asymp.LCL  asymp.UCL
1     animate  0.3964153 0.06723752 Inf  0.2646321  0.5281984
2  collective -0.7179623 0.08646448 Inf -0.8874296 -0.5484951
3   inanimate -3.9047237 0.17581069 Inf -4.2493063 -3.5601411
4    locative -1.5091152 0.16659051 Inf -1.8356266 -1.1826038
5    temporal -0.6649763 0.16745844 Inf -0.9931888 -0.3367638

… or with effects (which gives us predictions on the probability scale):

an_d <- data.frame(       # make an_d a data frame of
   an <- predictorEffect( # the effect
      "POR_ANIMACY",      # of POR_ANIMACY
      m_01))              # in m_01
an_d$fit_lo <- qlogis(an_d$fit)
an_d

  POR_ANIMACY        fit          se      lower      upper     fit_lo
1     animate 0.59782609 0.016165922 0.56577463 0.62906282  0.3964153
2  collective 0.32784185 0.019053448 0.29164055 0.36621363 -0.7179623
3   inanimate 0.01974865 0.003403457 0.01407325 0.02764863 -3.9047237
4    locative 0.18106996 0.024702646 0.13756935 0.23458435 -1.5091152
5    temporal 0.33962264 0.037557428 0.27028269 0.41659580 -0.6649763

We can therefore add the predictions to the data frame as usual:

d$PRED_PP_s <- predict( # make d$PRED_PP_s the result of predicting
   m_01,                # from m_01
   type="response")     # predicted probabilities of "s"
d$PRED_LO_s <- predict( # make d$PRED_PP_s the result of predicting
   m_01)                # from m_01
d$PRED_CAT <- factor(         # make d$PRED_CAT the result of checking
   ifelse(d$PRED_PP_s>=0.5,   # if the predicted prob. of "s" is >=0.5
      levels(d$GENITIVE)[2],  # if yes, predict "s"
      levels(d$GENITIVE)[1]), # otherwise, predict "of"
   levels=levels(d$GENITIVE)) # make sure both levels are registered
d[2:1,]

  CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
2    3       of     nns   spoken         22          7     animate
1    2       of     nns   spoken         13          7  collective
  POR_FINAL_SIB  POR_DEF PRED_PP_s  PRED_LO_s PRED_CAT
2        absent definite 0.5978261  0.3964153        s
1        absent definite 0.3278418 -0.7179623       of

(I am skipping PRED_PP_obs now, but you shouldn’t!)

But let’s now evaluate the predictive capacity of the model. How well is the model doing?

(c_m <- table(          # confusion matrix: cross-tabulate
   "OBS"  =d$GENITIVE,  # observed genitives in the rows
   "PREDS"=d$PRED_CAT)) # predicted orders in the columns

    PREDS
OBS    of    s
  of 2350  370
  s   330  550

c( # evaluate the confusion matrix
   "Prec. for s"     =c_m[ "s", "s"] / sum(c_m[    , "s"]),
   "Acc./rec. for s" =c_m[ "s", "s"] / sum(c_m[ "s",    ]),
   "Prec. for of"    =c_m["of","of"] / sum(c_m[    ,"of"]),
   "Acc./rec. for of"=c_m["of","of"] / sum(c_m["of",    ]),
   "Acc. (overall)"  =mean(d$GENITIVE==d$PRED_CAT))

     Prec. for s  Acc./rec. for s     Prec. for of Acc./rec. for of 
       0.5978261        0.6250000        0.8768657        0.8639706 
  Acc. (overall) 
       0.8055556

c(R2s(m_01),
  "Cohen's kappa"=cohens.kappa(c_m)[[1]],
  "C"=C.score(d$GENITIVE, d$PRED_PP_s))

  McFadden R-squared Nagelkerke R-squared        Cohen's kappa 
           0.3092390            0.4336234            0.4815668 
                   C 
           0.8472389

This model is obviously doing quite well!

We have a predictor with multiple levels now – what might we want to do? Post-hoc comparisons:

pairs(                # compute pairwise comparisons
    emmeans( # of means computed
      object=m_01,    # for the model m_01
      specs= ~ POR_ANIMACY), # namely all contrasts of this predictor
   adjust="holm")     # now we adjust because we're fishing

 contrast               estimate    SE  df z.ratio p.value
 animate - collective      1.114 0.110 Inf  10.174  <.0001
 animate - inanimate       4.301 0.188 Inf  22.851  <.0001
 animate - locative        1.906 0.180 Inf  10.607  <.0001
 animate - temporal        1.061 0.180 Inf   5.882  <.0001
 collective - inanimate    3.187 0.196 Inf  16.265  <.0001
 collective - locative     0.791 0.188 Inf   4.215  0.0001
 collective - temporal    -0.053 0.188 Inf  -0.281  0.7786
 inanimate - locative     -2.396 0.242 Inf  -9.891  <.0001
 inanimate - temporal     -3.240 0.243 Inf -13.343  <.0001
 locative - temporal      -0.844 0.236 Inf  -3.574  0.0007

Results are given on the log odds ratio (not the response) scale. 
P value adjustment: holm method for 10 tests

Every prediction is significantly different from every other one, with just one exception.

2.3.2 Visual interpretation

Let’s visualize the nature of the effect of POR_ANIMACY based on the predictions; for something we’ll need in a moment, we also add log odds predictions:

plot(an,            # plot the effect of POR_ANIMACY
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

Figure 45: The effect of POR_ANIMACY in m_01

2.3.3 Excursus: conflation via model comparison

Given a lot of previous literature on the genitive alternation, you might actually have hypothesized that the only distinction one needs is one between animate (and maybe collective) and all inanimate possessors. Let’s use model comparison to check this. First, we create a version of POR_ANIMACY that conflates levels as desired:

d$POR_ANIMACY_confl <- d$POR_ANIMACY # create a copy of POR_ANIMACY
levels(d$POR_ANIMACY_confl) <-       # overwrite the copy's levels such that
   c("anim/coll", "anim/coll",       # "animate" and "collective" become "anim/coll"
      "inanim/loc/temp", "inanim/loc/temp", "inanim/loc/temp") # all others become "inanim/loc/temp"
table(POR_ANIMACY=d$POR_ANIMACY, POR_ANIMACY_confl=d$POR_ANIMACY_confl) # check you did it right

            POR_ANIMACY_confl
POR_ANIMACY  anim/coll inanim/loc/temp
  animate          920               0
  collective       607               0
  inanimate          0            1671
  locative           0             243
  temporal           0             159

d[picky,]

   CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
2     3       of     nns   spoken         22          7     animate
1     2       of     nns   spoken         13          7  collective
17   20       of     nns   spoken         17         12   inanimate
55   59       of     nns   spoken          7         11    locative
65   72        s     nns   spoken          5          8    temporal
   POR_FINAL_SIB    POR_DEF  PRED_PP_s  PRED_LO_s PRED_CAT POR_ANIMACY_confl
2         absent   definite 0.59782609  0.3964153        s         anim/coll
1         absent   definite 0.32784185 -0.7179623       of         anim/coll
17        absent indefinite 0.01974865 -3.9047237       of   inanim/loc/temp
55        absent   definite 0.18106996 -1.5091152       of   inanim/loc/temp
65        absent   definite 0.33962264 -0.6649763       of   inanim/loc/temp

We already have the model with POR_ANIMACY as the only predictor so we only need to fit a new model with POR_ANIMACY_confl as the only predictor:

summary(m_01_animconfl <- glm( # make/summarize the gen. linear model m_01_animconfl:
   GENITIVE ~ 1 + POR_ANIMACY_confl, # GENITIVE ~ an overall intercept (1) & POR_ANIMACY_confl
   family=binomial, data=d, na.action=na.exclude)) # the other arguments


Call:
glm(formula = GENITIVE ~ 1 + POR_ANIMACY_confl, family = binomial, 
    data = d, na.action = na.exclude)

Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      -0.03799    0.05119  -0.742    0.458    
POR_ANIMACY_conflinanim/loc/temp -2.65829    0.10377 -25.616   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 3093.4  on 3598  degrees of freedom
AIC: 3097.4

Number of Fisher Scoring iterations: 5

… and then we do a comparison with anova just like above, the only difference being the argument to test (because we’re now fitting a generalized linear model).

anova(m_01,        # compare the model w/ the 5-level predictor POR_ANIMACY
   m_01_animconfl, # & the model w/ the 2-level predictor POR_ANIMACY_confl
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df  Deviance     Pr..Chi.
1      3595   2765.995 NA        NA           NA
2      3598   3093.390 -3 -327.3942 1.169659e-70

The difference between the two models is hugely significant, we cannot reduce POR_ANIMACY to the only two levels we were considering.

2.3.4 Write-up

To determine whether the choice of a genitive varies as a function of the degree/kind of animacy of the possessor (animate vs. collective vs. inanimate vs. locative vs. temporal), a generalized linear model was fit with GENITIVE as the response variable and POR_ANIMACY as a predictor. The model was highly significant (LR=1238.3, df=4, p<0.001) and explains a decent amount of the response variable (Nagelkerke R²=0.434, McFadden’s R²=0.309, Cohen’s κ=0.482, C=0.85) with a good amount of predictive power: [list precision & accuracy/recall scores for both genitives]. The summary table of the model is shown here [I changed the order of the coefficients for maybe better interpretability].

	Estimate	95%-CI	se	z	p_2-tailed
Intercept (POR_ANIMACY=animate)	0.40	[0.27, 0.53]	0.07	5.9	<0.001
POR_ANIMACY_{animate→temporal}	-1.06	[-1.42, -0.71]	0.18	-5.88	<0.002
POR_ANIMACY_{animate→collective}	-1.11	[-1.33, -0.9]	0.11	-10.17	<0.002
POR_ANIMACY_{animate→locative}	-1.91	[-2.27, -1.56]	0.18	-10.61	<0.002
POR_ANIMACY_{animate→inanimate}	-4.30	[-4.69, -3.95]	0.19	-22.85	<0.002

The model predicts s-genitives for animate possessors but of-genitives for the rest in the order of temporal/collective possessors, then locative possessors, and then very strongly for inanimate possessors. [Show a plot] A post-hoc comparison of the above model with one in which POR_ANIMACY was reduced to 2 levels (animate/collective vs. inanimate/locative/temporal) showed that this reduced predictor is highly significantly worse (LR=327.39, df=3, p<0.0001).

# housekeeping: remove the predictions
str(d <- d[,1:9])

'data.frame':   3600 obs. of  9 variables:
 $ CASE         : int  2 3 4 5 6 7 8 9 10 11 ...
 $ GENITIVE     : Factor w/ 2 levels "of","s": 1 1 1 1 2 2 2 2 2 2 ...
 $ SPEAKER      : Factor w/ 2 levels "nns","ns": 1 1 1 1 1 1 1 1 1 1 ...
 $ MODALITY     : Factor w/ 2 levels "spoken","written": 1 1 1 1 1 1 1 1 1 1 ...
 $ POR_LENGTH   : int  13 22 11 26 8 7 6 5 5 5 ...
 $ PUM_LENGTH   : int  7 7 8 4 4 3 36 3 5 5 ...
 $ POR_ANIMACY  : Factor w/ 5 levels "animate","collective",..: 2 1 1 2 1 1 1 1 1 1 ...
 $ POR_FINAL_SIB: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 1 ...
 $ POR_DEF      : Factor w/ 2 levels "definite","indefinite": 1 1 1 1 1 1 1 1 1 1 ...

2.4 A numeric predictor

We now switch to a numeric predictor and ask, does the choice of a genitive construction vary as a function of the length of the possessor? How would we explore/describe this? # replace with your own code/answers

par(mfrow=c(1,2))
hist(d$POR_LENGTH, breaks="FD")
spineplot(d$GENITIVE ~ d$POR_LENGTH)

par(mfrow=c(1,1))

Maybe not …

d$POR_LENGTH_LOG <- log2(d$POR_LENGTH)
par(mfrow=c(1,2))
hist(d$POR_LENGTH_LOG, breaks="FD")
spineplot(d$GENITIVE ~ d$POR_LENGTH_LOG)
par(mfrow=c(1,1))

Figure 46: GENITIVE as a function of POR_LENGTH (logged)

Now we’re talking …

2.4.1 Modeling & numerical interpretation

Let’s fit our model:

summary(m_01 <- glm(      # make/summarize the gen. linear model m_01
   GENITIVE ~ 1 +         # GENITIVE ~ an overall intercept (1)
   POR_LENGTH_LOG,        # & the predictor POR_LENGTH_LOG
   family=binomial,       # resp = binary
   data=d,                # those vars = in x
   na.action=na.exclude)) # skip cases with NA/missing data


Call:
glm(formula = GENITIVE ~ 1 + POR_LENGTH_LOG, family = binomial, 
    data = d, na.action = na.exclude)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     1.90960    0.17609   10.85   <2e-16 ***
POR_LENGTH_LOG -0.90493    0.05325  -16.99   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 3643.1  on 3598  degrees of freedom
AIC: 3647.1

Number of Fisher Scoring iterations: 4

Confint(m_01) # confidence intervals

                 Estimate     2.5 %     97.5 %
(Intercept)     1.9096029  1.567704  2.2581724
POR_LENGTH_LOG -0.9049291 -1.010777 -0.8019605

pchisq(              # compute the area under the chi-squared curve
   q=4004.3-3643.1,  # for this chi-squared value: null - residual deviance
   df=3599-3598,     # at this df: null - residual df
   lower.tail=FALSE) # only using the right/upper tail/side

[1] 1.542725e-80

drop1(m_01, # drop each droppable predictor at a time from m_01 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

               Df Deviance      AIC      LRT     Pr..Chi.
<none>         NA 3643.097 3647.097       NA           NA
POR_LENGTH_LOG  1 4004.273 4006.273 361.1758 1.561528e-80

The coefficients mean what? The longer the possessor, the less likely s-genitives become:

the intercept is the prediction when the predictor is 0;
coefficient 2 is how you get from the intercept to the prediction when the predictor increases by 1, and that difference is significant.

Let’s check the predictions:

picky <- c(1133, 873)
d[picky,c("GENITIVE", "POR_LENGTH_LOG", "POR_LENGTH")]

     GENITIVE POR_LENGTH_LOG POR_LENGTH
1133       of              0          1
873         s              1          2

model.matrix(m_01)[picky,]

     (Intercept) POR_LENGTH_LOG
1133           1              0
873            1              1

sum(coef(m_01) * model.matrix(m_01)[1133,])

[1] 1.909603

sum(coef(m_01) * model.matrix(m_01)[873,])

[1] 1.004674

As always, we can compute them more easily with effects:

(relevants_l <- c(
   0:8, # these help us understand the slope
        # and cover the range of the actual values
   mean(d$POR_LENGTH_LOG)) %>% sort) # the central tendency

 [1] 0.000000 1.000000 2.000000 3.000000 3.534649 4.000000 5.000000 6.000000
 [9] 7.000000 8.000000

(le_d <- data.frame(    # make le_d a data frame of
   le <- effect(        # the effect
      "POR_LENGTH_LOG", # of POR_LENGTH_LOG
      m_01,             # in m_01
      # but predictions for our relevant values:
      xlevels=list(POR_LENGTH_LOG=relevants_l))))

   POR_LENGTH_LOG         fit          se       lower       upper
1        0.000000 0.870974529 0.019788298 0.826995055 0.905057520
2        1.000000 0.731976519 0.024503401 0.681327195 0.777208089
3        2.000000 0.524915543 0.019097226 0.487415805 0.562136377
4        3.000000 0.308917647 0.009060026 0.291448049 0.326951196
5        3.534649 0.216021562 0.007488942 0.201704834 0.231060347
6        4.000000 0.153149017 0.007572996 0.138887878 0.168587815
7        5.000000 0.068176570 0.006568341 0.056375854 0.082232176
8        6.000000 0.028749293 0.004293408 0.021429263 0.038471473
9        7.000000 0.011833690 0.002404071 0.007940200 0.017602483
10       8.000000 0.004821536 0.001237943 0.002913467 0.007969237

# or this:
# (le_d <- data.frame( # make le_d a data frame of
#    le <- predictorEffect( # the effect
#       "POR_LENGTH_LOG",   # of POR_LENGTH_LOG
#       m_01,               # in m_01
#       # but predictions for our relevant values:
#       focal.levels=relevants_l)))

And we add our predictions to the data frame as usual …

d$PRED_PP_s <- predict( # make d$PRED_PP_s the result of predicting
   m_01,                # from m_01
   type="response")     # predicted probabilities of "s"
d$PRED_LO_s <- predict( # make d$PRED_PP_s the result of predicting
   m_01)                # from m_01
d$PRED_CAT <- factor(         # make d$PRED_CAT the result of checking
   ifelse(d$PRED_PP_s>=0.5,   # if the predicted prob. of "s" is >=0.5
      levels(d$GENITIVE)[2],  # if yes, predict "s"
      levels(d$GENITIVE)[1]), # otherwise, predict "of"
   levels=levels(d$GENITIVE)) # make sure both levels are registered
d[picky,]

     CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
1133 1259       of     nns   spoken          1         23   inanimate
873   973        s     nns   spoken          2         12  collective
     POR_FINAL_SIB    POR_DEF POR_LENGTH_LOG PRED_PP_s PRED_LO_s PRED_CAT
1133        absent indefinite              0 0.8709745  1.909603        s
873         absent   definite              1 0.7319765  1.004674        s

… and evaluate them as usual:

(c_m <- table(          # confusion matrix: cross-tabulate
   "OBS"  =d$GENITIVE,  # observed genitives in the rows
   "PREDS"=d$PRED_CAT)) # predicted orders in the columns

    PREDS
OBS    of    s
  of 2614  106
  s   809   71

c( # evaluate the confusion matrix
   "Prec. for s"     =c_m[ "s", "s"] / sum(c_m[    , "s"]),
   "Acc./rec. for s" =c_m[ "s", "s"] / sum(c_m[ "s",    ]),
   "Prec. for of"    =c_m["of","of"] / sum(c_m[    ,"of"]),
   "Acc./rec. for of"=c_m["of","of"] / sum(c_m["of",    ]),
   "Acc. (overall)"  =mean(d$GENITIVE==d$PRED_CAT))

     Prec. for s  Acc./rec. for s     Prec. for of Acc./rec. for of 
      0.40112994       0.08068182       0.76365761       0.96102941 
  Acc. (overall) 
      0.74583333

c(R2s(m_01),
  "Cohen's kappa"=cohens.kappa(c_m)[[1]],
  "C"=C.score(d$GENITIVE, d$PRED_PP_s))

  McFadden R-squared Nagelkerke R-squared        Cohen's kappa 
          0.09019761           0.14222054           0.05715463 
                   C 
          0.70785741

This model is obviously doing not so well. Let’s see why that is.

2.4.2 Visual interpretation

Here’s a plot:

plot(le,            # plot the effect of POR_LENGTH_LOG
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

Figure 47: The effect of POR_LENGTH_LOG in m_01

Looks like a nice line – why does it do so badly? In part because there are very few predicted probabilities that exceed 0.5:

Figure 48: The effect of POR_LENGTH_LOG in m_01

2.4.3 Write-up

To determine whether the choice of a genitive varies as a function of the length of the possessor, a generalized linear model was fit with GENITIVE as the response variable and POR_LENGTH as a predictor; however, the initial model resulted in extremely poor fit probably in part because of the very long right tail of the predictor, which was therefore logged to the base of 2. A new model with this predictor was highly significant (LR=361.18, df=1, p<0.001) but explains only a small amount of the response variable (Nagelkerke R²=0.142, McFadden’s R²=0.09, Cohen’s κ=0.057, C=0.708) with a fairly bad amount of predictive power: [list precision & accuracy/recall scores for both genitives]. The summary table of the model is shown here.

	Estimate	95%-CI	se	z	p_2-tailed
Intercept (POR_LENGTH=0)	1.91	[1.57, 2.26]	0.18	10.85	<0.001
POR_LENGTH_0→1	-0.9	[-1.01, -0.8]	0.05	-16.99	<0.001

The model indicates a negative correlation between s-genitives and possessor length: the longer the possessor, the less likely s-genitives become and the more likely of-genitives become. [Show a plot]

2.5 A backwards model selection process

Now, we will deal with this in a multifactorial way. Specifically, we are asking, does the choice of a genitive construction (of vs. s) vary as a function of

a length-based short-before-long effect, but, this time, if we hypothesize a short-before-long effect, maybe we should not just be looking at the length of the possessor (POR_LENGTH), but how the length of the possessor compares to the length of the possessum (PUM_LENGTH); since such a comparison variable doesn’t exist yet in our data set, we need to create it first;
the degree/kind of animacy of the possessor (POR_ANIMACY: animate vs. collective vs. inanimate vs. locative vs. temporal);
whether the speakers are non-native speakers or native speakers of English (SPEAKER: nns vs. ns);
any pairwise interaction of these predictors;
the three-way interaction of these predictors?

Quick reminder:

baseline

[1] 0.7555556

deviance(m_00)

[1] 4004.273

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
hist(d$POR_LENGTH-d$PUM_LENGTH, breaks="FD", main="")
hist(d$LEN_PORmPUM_LOG <- log2(d$POR_LENGTH)-log2(d$PUM_LENGTH), breaks="FD", main="")
table(d$POR_ANIMACY)


   animate collective  inanimate   locative   temporal 
       920        607       1671        243        159

table(d$SPEAKER)


 nns   ns 
2666  934

# the predictor(s) w/ the response
spineplot(d$GENITIVE ~ d$LEN_PORmPUM_LOG)
table(d$POR_ANIMACY, d$GENITIVE)

            
               of    s
  animate     370  550
  collective  408  199
  inanimate  1638   33
  locative    199   44
  temporal    105   54

table(d$SPEAKER, d$GENITIVE)

     
        of    s
  nns 2024  642
  ns   696  238

ftable(d$POR_ANIMACY, d$SPEAKER, d$GENITIVE)

                  of    s
                         
animate    nns   272  406
           ns     98  144
collective nns   314  138
           ns     94   61
inanimate  nns  1250   27
           ns    388    6
locative   nns   117   34
           ns     82   10
temporal   nns    71   37
           ns     34   17

Figure 49: Exploration for the genitive data

Figure 50: Exploration for the genitive data

Figure 51: Exploration for the genitive data

2.5.1 Modeling & numerical evaluation

Let’s fit our initial regression model:

summary(m_01 <- glm(      # make/summarize the gen. linear model m_01:
   GENITIVE ~ 1 +         # GENITIVE ~ an overall intercept (1)
   LEN_PORmPUM_LOG*POR_ANIMACY*SPEAKER, # & these predictors & their interaction
   family=binomial,       # resp = binary
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data


Call:
glm(formula = GENITIVE ~ 1 + LEN_PORmPUM_LOG * POR_ANIMACY * 
    SPEAKER, family = binomial, data = d, na.action = na.exclude)

Coefficients:
                                                Estimate Std. Error z value
(Intercept)                                      0.65632    0.09135   7.184
LEN_PORmPUM_LOG                                 -0.70601    0.08154  -8.659
POR_ANIMACYcollective                           -1.77888    0.16150 -11.015
POR_ANIMACYinanimate                            -4.32698    0.21686 -19.953
POR_ANIMACYlocative                             -2.30616    0.27227  -8.470
POR_ANIMACYtemporal                             -1.32138    0.22536  -5.863
SPEAKERns                                       -0.08278    0.17204  -0.481
LEN_PORmPUM_LOG:POR_ANIMACYcollective           -0.46680    0.15054  -3.101
LEN_PORmPUM_LOG:POR_ANIMACYinanimate             0.32099    0.17729   1.811
LEN_PORmPUM_LOG:POR_ANIMACYlocative             -0.29249    0.25885  -1.130
LEN_PORmPUM_LOG:POR_ANIMACYtemporal              0.44522    0.18375   2.423
LEN_PORmPUM_LOG:SPEAKERns                        0.14245    0.16001   0.890
POR_ANIMACYcollective:SPEAKERns                  0.79493    0.29353   2.708
POR_ANIMACYinanimate:SPEAKERns                  -0.28854    0.51213  -0.563
POR_ANIMACYlocative:SPEAKERns                   -0.50011    0.49785  -1.005
POR_ANIMACYtemporal:SPEAKERns                    0.16161    0.41275   0.392
LEN_PORmPUM_LOG:POR_ANIMACYcollective:SPEAKERns -0.14283    0.28943  -0.493
LEN_PORmPUM_LOG:POR_ANIMACYinanimate:SPEAKERns  -0.65372    0.43153  -1.515
LEN_PORmPUM_LOG:POR_ANIMACYlocative:SPEAKERns   -0.19485    0.50679  -0.384
LEN_PORmPUM_LOG:POR_ANIMACYtemporal:SPEAKERns   -0.53061    0.38091  -1.393
                                                Pr(>|z|)    
(Intercept)                                     6.75e-13 ***
LEN_PORmPUM_LOG                                  < 2e-16 ***
POR_ANIMACYcollective                            < 2e-16 ***
POR_ANIMACYinanimate                             < 2e-16 ***
POR_ANIMACYlocative                              < 2e-16 ***
POR_ANIMACYtemporal                             4.54e-09 ***
SPEAKERns                                        0.63039    
LEN_PORmPUM_LOG:POR_ANIMACYcollective            0.00193 ** 
LEN_PORmPUM_LOG:POR_ANIMACYinanimate             0.07022 .  
LEN_PORmPUM_LOG:POR_ANIMACYlocative              0.25849    
LEN_PORmPUM_LOG:POR_ANIMACYtemporal              0.01539 *  
LEN_PORmPUM_LOG:SPEAKERns                        0.37331    
POR_ANIMACYcollective:SPEAKERns                  0.00677 ** 
POR_ANIMACYinanimate:SPEAKERns                   0.57315    
POR_ANIMACYlocative:SPEAKERns                    0.31511    
POR_ANIMACYtemporal:SPEAKERns                    0.69538    
LEN_PORmPUM_LOG:POR_ANIMACYcollective:SPEAKERns  0.62166    
LEN_PORmPUM_LOG:POR_ANIMACYinanimate:SPEAKERns   0.12980    
LEN_PORmPUM_LOG:POR_ANIMACYlocative:SPEAKERns    0.70062    
LEN_PORmPUM_LOG:POR_ANIMACYtemporal:SPEAKERns    0.16362    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 2404.3  on 3580  degrees of freedom
AIC: 2444.3

Number of Fisher Scoring iterations: 7

Now what?

pchisq(              # compute the area under the chi-squared curve
   q =m_01$null.deviance-m_01$deviance, # for this chi-squared value: null - res. dev.
   df=m_01$df.null-m_01$df.residual,    # at this df: null - residual df
   lower.tail=FALSE) # only using the right/upper tail/side

[1] 0

drop1(m_01, # drop each droppable predictor at a time from m_01 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                                    Df Deviance      AIC      LRT  Pr..Chi.
<none>                              NA 2404.267 2444.267       NA        NA
LEN_PORmPUM_LOG:POR_ANIMACY:SPEAKER  4 2408.005 2440.005 3.738401 0.4425653

Thus, we delete the predictor with the highest non-significant p-value and fit the 2nd model with that 3-way interaction LEN_PORmPUM_LOG:POR_ANIMACY:SPEAKER deleted:

summary(m_02 <- update( # make m_02 an update of
   m_01, .~.            # m_01, namely all of it (.~.), but then
   - LEN_PORmPUM_LOG:POR_ANIMACY:SPEAKER)) # remove this interaction


Call:
glm(formula = GENITIVE ~ LEN_PORmPUM_LOG + POR_ANIMACY + SPEAKER + 
    LEN_PORmPUM_LOG:POR_ANIMACY + LEN_PORmPUM_LOG:SPEAKER + POR_ANIMACY:SPEAKER, 
    family = binomial, data = d, na.action = na.exclude)

Coefficients:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                            0.63865    0.08956   7.131 9.99e-13 ***
LEN_PORmPUM_LOG                       -0.66568    0.07506  -8.869  < 2e-16 ***
POR_ANIMACYcollective                 -1.75874    0.15934 -11.038  < 2e-16 ***
POR_ANIMACYinanimate                  -4.30116    0.21500 -20.006  < 2e-16 ***
POR_ANIMACYlocative                   -2.29342    0.26298  -8.721  < 2e-16 ***
POR_ANIMACYtemporal                   -1.31543    0.22698  -5.795 6.82e-09 ***
SPEAKERns                             -0.01779    0.16826  -0.106   0.9158    
LEN_PORmPUM_LOG:POR_ANIMACYcollective -0.50145    0.12853  -3.901 9.56e-05 ***
LEN_PORmPUM_LOG:POR_ANIMACYinanimate   0.20466    0.15951   1.283   0.1995    
LEN_PORmPUM_LOG:POR_ANIMACYlocative   -0.34103    0.22263  -1.532   0.1256    
LEN_PORmPUM_LOG:POR_ANIMACYtemporal    0.31318    0.16011   1.956   0.0505 .  
LEN_PORmPUM_LOG:SPEAKERns             -0.02120    0.11639  -0.182   0.8555    
POR_ANIMACYcollective:SPEAKERns        0.72665    0.29266   2.483   0.0130 *  
POR_ANIMACYinanimate:SPEAKERns        -0.30015    0.48555  -0.618   0.5365    
POR_ANIMACYlocative:SPEAKERns         -0.55095    0.45355  -1.215   0.2245    
POR_ANIMACYtemporal:SPEAKERns          0.08259    0.40494   0.204   0.8384    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 2408.0  on 3584  degrees of freedom
AIC: 2440

Number of Fisher Scoring iterations: 7

anova(m_01, m_02,               # compare m_01 to m_02
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df  Deviance  Pr..Chi.
1      3580   2404.267 NA        NA        NA
2      3584   2408.005 -4 -3.738401 0.4425653

drop1(m_02, # drop each droppable predictor at a time from m_02 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                            Df Deviance      AIC         LRT     Pr..Chi.
<none>                      NA 2408.005 2440.005          NA           NA
LEN_PORmPUM_LOG:POR_ANIMACY  4 2439.771 2463.771 31.76615364 2.135595e-06
LEN_PORmPUM_LOG:SPEAKER      1 2408.038 2438.038  0.03325705 8.552962e-01
POR_ANIMACY:SPEAKER          4 2418.886 2442.886 10.88037857 2.794180e-02

Let’s fit the 3rd model with that interaction LEN_PORmPUM_LOG:SPEAKER deleted:

summary(m_03 <- update( # make m_03 an update of
   m_02, .~.            # m_02, namely all of it (.~.), but then
   - LEN_PORmPUM_LOG:SPEAKER)) # remove this interaction


Call:
glm(formula = GENITIVE ~ LEN_PORmPUM_LOG + POR_ANIMACY + SPEAKER + 
    LEN_PORmPUM_LOG:POR_ANIMACY + POR_ANIMACY:SPEAKER, family = binomial, 
    data = d, na.action = na.exclude)

Coefficients:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                            0.64082    0.08887   7.211 5.57e-13 ***
LEN_PORmPUM_LOG                       -0.67066    0.06999  -9.582  < 2e-16 ***
POR_ANIMACYcollective                 -1.76343    0.15747 -11.198  < 2e-16 ***
POR_ANIMACYinanimate                  -4.30316    0.21475 -20.038  < 2e-16 ***
POR_ANIMACYlocative                   -2.29898    0.26155  -8.790  < 2e-16 ***
POR_ANIMACYtemporal                   -1.31845    0.22659  -5.819 5.93e-09 ***
SPEAKERns                             -0.02626    0.16145  -0.163  0.87078    
LEN_PORmPUM_LOG:POR_ANIMACYcollective -0.50226    0.12847  -3.910 9.25e-05 ***
LEN_PORmPUM_LOG:POR_ANIMACYinanimate   0.20638    0.15920   1.296  0.19486    
LEN_PORmPUM_LOG:POR_ANIMACYlocative   -0.34169    0.22268  -1.534  0.12492    
LEN_PORmPUM_LOG:POR_ANIMACYtemporal    0.31237    0.16007   1.951  0.05100 .  
POR_ANIMACYcollective:SPEAKERns        0.73847    0.28489   2.592  0.00954 ** 
POR_ANIMACYinanimate:SPEAKERns        -0.29379    0.48413  -0.607  0.54396    
POR_ANIMACYlocative:SPEAKERns         -0.53295    0.44234  -1.205  0.22827    
POR_ANIMACYtemporal:SPEAKERns          0.08969    0.40281   0.223  0.82380    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4004.3  on 3599  degrees of freedom
Residual deviance: 2408.0  on 3585  degrees of freedom
AIC: 2438

Number of Fisher Scoring iterations: 7

anova(m_02, m_03,               # compare m_02 to m_03
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df    Deviance  Pr..Chi.
1      3584   2408.005 NA          NA        NA
2      3585   2408.038 -1 -0.03325705 0.8552962

drop1(m_03, # drop each droppable predictor at a time from m_03 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                            Df Deviance      AIC      LRT     Pr..Chi.
<none>                      NA 2408.038 2438.038       NA           NA
LEN_PORmPUM_LOG:POR_ANIMACY  4 2439.968 2461.968 31.92969 1.977450e-06
POR_ANIMACY:SPEAKER          4 2419.104 2441.104 11.06578 2.583461e-02

Confint(m_final <- m_03); rm(m_03); invisible(gc())

                                         Estimate        2.5 %      97.5 %
(Intercept)                            0.64081690  0.468616712  0.81724071
LEN_PORmPUM_LOG                       -0.67065848 -0.810966407 -0.53635908
POR_ANIMACYcollective                 -1.76342664 -2.077305396 -1.45947457
POR_ANIMACYinanimate                  -4.30316052 -4.744994412 -3.90029037
POR_ANIMACYlocative                   -2.29898489 -2.838931650 -1.80923537
POR_ANIMACYtemporal                   -1.31844526 -1.772188001 -0.88148760
SPEAKERns                             -0.02626334 -0.341841030  0.29159142
LEN_PORmPUM_LOG:POR_ANIMACYcollective -0.50225743 -0.759750437 -0.25528062
LEN_PORmPUM_LOG:POR_ANIMACYinanimate   0.20637789 -0.102927878  0.52190369
LEN_PORmPUM_LOG:POR_ANIMACYlocative   -0.34169459 -0.798758018  0.07811273
LEN_PORmPUM_LOG:POR_ANIMACYtemporal    0.31237087 -0.008354092  0.62161919
POR_ANIMACYcollective:SPEAKERns        0.73846679  0.180371066  1.29820145
POR_ANIMACYinanimate:SPEAKERns        -0.29379122 -1.331890328  0.59609089
POR_ANIMACYlocative:SPEAKERns         -0.53294681 -1.434208534  0.31251344
POR_ANIMACYtemporal:SPEAKERns          0.08968869 -0.708937514  0.87541391

Let’s compute the overall significance test for this model:

anova(m_00, m_final, # compare m_00 to m_final
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df Deviance Pr..Chi.
1      3599   4004.273 NA       NA       NA
2      3585   2408.038 14 1596.235        0

pchisq(              # compute the area under the chi-squared curve
   q=4004.3-2408,    # for this chi-squared value
   df=14,            # at 14 df
   lower.tail=FALSE) # only using the right/upper tail/side

[1] 0

Let’s make ‘predictions’ and evaluate them:

d$PRED_PP_s <- predict( # make d$PRED_PP_s the result of predicting
   m_final,             # from m_final
   type="response")     # predicted probabilities of "s"
# if you don't say type="response", predict gives you log odds
d$PRED_CAT <- factor(         # make d$PRED_CAT the result of checking
   ifelse(d$PRED_PP_s>=0.5,   # if the predicted prob. of "s" is >=0.5
      levels(d$GENITIVE)[2],  # if yes, predict "s"
      levels(d$GENITIVE)[1]), # otherwise, predict "of"
   levels=levels(d$GENITIVE)) # make sure both levels are registered
(c_m <- table(           # confusion matrix: cross-tabulate
   "OBS"  =d$GENITIVE,   # observed genitives in the rows
   "PREDS"=d$PRED_CAT)) # predicted orders in the columns

    PREDS
OBS    of    s
  of 2454  266
  s   310  570

c( # evaluate the confusion matrix
   "Prec. for s"     =c_m[ "s", "s"] / sum(c_m[    , "s"]),
   "Acc./rec. for s" =c_m[ "s", "s"] / sum(c_m[ "s",    ]),
   "Prec. for of"    =c_m["of","of"] / sum(c_m[    ,"of"]),
   "Acc./rec. for of"=c_m["of","of"] / sum(c_m["of",    ]),
   "Acc. (overall)"  =mean(d$GENITIVE==d$PRED_CAT))

     Prec. for s  Acc./rec. for s     Prec. for of Acc./rec. for of 
       0.6818182        0.6477273        0.8878437        0.9022059 
  Acc. (overall) 
       0.8400000

c(R2s(m_final),
  "Cohen's kappa"=cohens.kappa(c_m)[[1]],
  "C"=C.score(d$GENITIVE, d$PRED_PP_s))

  McFadden R-squared Nagelkerke R-squared        Cohen's kappa 
           0.3986328            0.5335965            0.5593935 
                   C 
           0.8986000

This looks like a pretty decent model, not bad at all (esp. Cohen’s kappa and the C-score).

2.5.2 Interpretation

2.5.2.1 The interaction `LEN_PORmPUM_LOG:POR_ANIMACY`

Let’s visualize the nature of the effect of LEN_PORmPUM_LOG:POR_ANIMACY based on the predictions:

(lean_d <- data.frame(                    # make lean_d a data frame of
   lean <- effect(                        # the effect
      "LEN_PORmPUM_LOG:POR_ANIMACY",      # of LEN_PORmPUM_LOG:POR_ANIMACY
      xlevels=list(LEN_PORmPUM_LOG=-4:4), # w/ these values for LEN_PORmPUM_LOG
      m_final)))                          # in m_final

   LEN_PORmPUM_LOG POR_ANIMACY         fit          se        lower       upper
1               -4     animate 0.964995167 0.010732807 0.9366654268 0.980911180
2               -3     animate 0.933762692 0.015464168 0.8962219877 0.958354439
3               -2     animate 0.878181385 0.019688827 0.8340490730 0.911818185
4               -1     animate 0.786618460 0.020577239 0.7435273242 0.824181653
5                0     animate 0.653396580 0.017519047 0.6183110604 0.686890163
6                1     animate 0.490837160 0.020513568 0.4507767242 0.531015622
7                2     animate 0.330192632 0.029080480 0.2758733167 0.389455090
8                3     animate 0.201334835 0.031196146 0.1470135317 0.269388791
9                4     animate 0.114190794 0.026335613 0.0718292027 0.176777133
10              -4  collective 0.977108479 0.009125137 0.9504633183 0.989578685
11              -3  collective 0.929623169 0.019952030 0.8790212514 0.960022358
12              -2  collective 0.803450188 0.032664294 0.7315682184 0.859772887
13              -1  collective 0.558501674 0.030970559 0.4972271761 0.618044933
14               0  collective 0.281338372 0.022128739 0.2400611313 0.326662174
15               1  collective 0.108056512 0.017076640 0.0788544432 0.146354725
16               2  collective 0.036135870 0.009475954 0.0215225418 0.060062271
17               3  collective 0.011468910 0.004239852 0.0055436094 0.023577327
18               4  collective 0.003577539 0.001705505 0.0014037321 0.009087051
19              -4   inanimate 0.131443260 0.069510582 0.0438729443 0.332938978
20              -3   inanimate 0.086864232 0.037608215 0.0361996368 0.194153942
21              -2   inanimate 0.056422056 0.018427345 0.0294486514 0.105418163
22              -1   inanimate 0.036225313 0.008239215 0.0231206688 0.056329289
23               0   inanimate 0.023081315 0.004063324 0.0163254197 0.032540495
24               1   inanimate 0.014634072 0.003229110 0.0094845324 0.022515951
25               2   inanimate 0.009249061 0.003021352 0.0048681010 0.017503241
26               3   inanimate 0.005833881 0.002646221 0.0023938177 0.014147422
27               4   inanimate 0.003675066 0.002161627 0.0011583720 0.011596083
28              -4    locative 0.904318818 0.065050659 0.6840949593 0.976331696
29              -3    locative 0.774484239 0.095877338 0.5393933482 0.909678349
30              -2    locative 0.555138390 0.087955205 0.3830574676 0.714940059
31              -1    locative 0.311975694 0.043846419 0.2330327080 0.403589135
32               0    locative 0.141455667 0.026043363 0.0976551339 0.200535737
33               1    locative 0.056486712 0.019914540 0.0279778871 0.110735922
34               2    locative 0.021290829 0.011835875 0.0070950857 0.062112445
35               3    locative 0.007842601 0.006002145 0.0017399124 0.034608204
36               4    locative 0.002864010 0.002793417 0.0004221157 0.019161222
37              -4    temporal 0.683948493 0.127825282 0.4044341600 0.873357610
38              -3    temporal 0.601972295 0.109156405 0.3824363698 0.786943563
39              -2    temporal 0.513847009 0.081662166 0.3577119523 0.667325461
40              -1    temporal 0.424852275 0.053325703 0.3250602122 0.531170088
41               0    temporal 0.340476161 0.038786244 0.2689980630 0.420033935
42               1    temporal 0.265132480 0.045062208 0.1865229243 0.362123903
43               2    temporal 0.201370889 0.055382698 0.1137786495 0.331195140
44               3    temporal 0.149817159 0.060437310 0.0650079736 0.308734224
45               4    temporal 0.109649779 0.059635675 0.0358613321 0.289652708

# alternative w/ predictorEffect:
# predictorEffect(      # the effect
#    "LEN_PORmPUM_LOG", # of LEN_PORmPUM_LOG:POR_ANIMACY
#    m_final,           # in m_final
#    focal.levels=-4:4) # w/ these values for LEN_PORmPUM_LOG
plot(lean,          # plot the effect of LEN_PORmPUM_LOG:POR_ANIMACY
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

This might be better with a one-panel kind of plot:

plot(lean,                      # plot the effect of LEN_PORmPUM_LOG:POR_ANIMACY
   type="response",             # but plot predicted probs
   ylim=c(0, 1), grid=TRUE,     # w/ these y-axis limits & w/ a grid
   multiline=TRUE,              # put all lines in one panel
   confint=list(style="auto"),  # keep confidence intervals
   lattice=list(key.args=list(  # make the
      columns=5, cex.title=0))) # legend have 5 columns & no title

Still not ideal, and do you recognize a couple of (related) general conceptual ‘problems’ here?

we cannot see whether lines/slopes are significantly different from each other because the lines are curved: there’s no easy way to compare the confidence intervals;
it’s therefore also not really straightforward to, for instance, rank-order the slopes from the plot;
it’s therefore also not obvious how we see whether the slopes for, say, inanimates and temporals are significantly different from 0.

To that end, we could look at emtrends again – to get the slopes …

(qwe <- emtrends(m_final, var="LEN_PORmPUM_LOG", specs= ~ POR_ANIMACY)) %>% data.frame

  POR_ANIMACY LEN_PORmPUM_LOG.trend         SE  df  asymp.LCL   asymp.UCL
1     animate            -0.6706585 0.06999367 Inf -0.8078436 -0.53347340
2  collective            -1.1729159 0.10772937 Inf -1.3840616 -0.96177022
3   inanimate            -0.4642806 0.14299041 Inf -0.7445366 -0.18402454
4    locative            -1.0123531 0.21139547 Inf -1.4266806 -0.59802557
5    temporal            -0.3582876 0.14395677 Inf -0.6404377 -0.07613753

… and pairs to get all pairwise contrasts:

pairs(qwe,            # pairwise comparisons of means
   adjust="holm") %>% # correct for multiple tests
   data.frame         # show as a data frame

                 contrast   estimate        SE  df    z.ratio      p.value
1    animate - collective  0.5022574 0.1284707 Inf  3.9095082 7.398738e-04
2     animate - inanimate -0.2063779 0.1592023 Inf -1.2963248 5.845908e-01
3      animate - locative  0.3416946 0.2226817 Inf  1.5344527 4.996733e-01
4      animate - temporal -0.3123709 0.1600708 Inf -1.9514543 2.550151e-01
5  collective - inanimate -0.7086353 0.1790304 Inf -3.9581849 6.796935e-04
6   collective - locative -0.1605628 0.2372629 Inf -0.6767298 9.971549e-01
7   collective - temporal -0.8146283 0.1798031 Inf -4.5306679 5.879751e-05
8    inanimate - locative  0.5480725 0.2552142 Inf  2.1474997 1.905211e-01
9    inanimate - temporal -0.1059930 0.2029034 Inf -0.5223814 9.971549e-01
10    locative - temporal -0.6540655 0.2557569 Inf -2.5573717 7.382652e-02

These (differences of) slopes are all on the log odds scale and since those are much easier to interpret – the lines will be straight, not curved like the regression lines in probability space – this plot now shows these straight-line log odds slopes in a more comparable way:

plot(lean,                      # plot the effect of LEN_PORmPUM_LOG:POR_ANIMACY
                                # note: no type="response"
   ylim=c(-5, 5), grid=TRUE,    # w/ these y-axis limits & w/ a grid
   multiline=TRUE,              # put all lines in one panel
   confint=list(style="auto"),  # keep confidence intervals
   lattice=list(key.args=list(  # make the
      columns=5, cex.title=0))) # legend have 5 columns & no title

Figure 52: The log odds effect of LEN_PORmPUM_LOG:POR_ANIMACY in m_final

2.5.2.2 The interaction `POR_ANIMACY:SPEAKER`

Let’s visualize the nature of the effect of POR_ANIMACY:SPEAKER based on the predictions, as always, we do both perspectives:

(ansp_d <- data.frame(       # make ansp_d a data frame of
   ansp <- effect(           # the effect
      "POR_ANIMACY:SPEAKER", # of POR_ANIMACY:SPEAKER
      m_final)))             # in m_final

   POR_ANIMACY SPEAKER        fit          se      lower      upper
1      animate     nns 0.59115315 0.020183766 0.55108531 0.63004555
2   collective     nns 0.16819763 0.021177629 0.13066029 0.21386608
3    inanimate     nns 0.02082207 0.004116296 0.01411360 0.03062019
4     locative     nns 0.11215912 0.029827786 0.06562281 0.18515703
5     temporal     nns 0.30513068 0.047091093 0.22126791 0.40427725
6      animate      ns 0.58479067 0.033545821 0.51791474 0.64868241
7   collective      ns 0.29188394 0.041924818 0.21690178 0.38020223
8    inanimate      ns 0.01520582 0.006205397 0.00680714 0.03361612
9     locative      ns 0.06735270 0.024460582 0.03256874 0.13413561
10    temporal      ns 0.31874217 0.066141896 0.20480038 0.45944946

# note what this does:
# data.frame(predictorEffect("POR_ANIMACY", m_final))
plot(ansp,          # plot the effect of POR_ANIMACY:SPEAKER
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid
plot(ansp,          # plot the effect of POR_ANIMACY:SPEAKER
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   x.var="SPEAKER", # w/ this predictor on the x-axis
   grid=TRUE)       # & w/ a grid

Figure 53: The effect of POR_ANIMACY:SPEAKER in m_final

Figure 54: The effect of POR_ANIMACY:SPEAKER in m_final

A multi-line plot might be better:

plot(ansp,          # plot the effect of POR_ANIMACY:SPEAKER
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   multiline=TRUE,  # all the lines in one panel, but
   confint=list(style="auto"), # w/ confidence intervals
   grid=TRUE)       # & w/ a grid

Figure 55: The effect of POR_ANIMACY:SPEAKER in m_final

This certainly calls for emmeans applications for– but not the desperate one – and …

(qwe <- emmeans(m_final, specs= ~ POR_ANIMACY | SPEAKER)) %>% data.frame # nearly same as

   POR_ANIMACY SPEAKER     emmean         SE  df   asymp.LCL  asymp.UCL
1      animate     nns  0.3687345 0.08351059 Inf  0.20505674  0.5324122
2   collective     nns -1.5984552 0.15136912 Inf -1.89513323 -1.3017772
3    inanimate     nns -3.8506997 0.20189287 Inf -4.24640242 -3.4549969
4     locative     nns -2.0688740 0.29953757 Inf -2.65595684 -1.4817911
5     temporal     nns -0.8229836 0.22210061 Inf -1.25829285 -0.3876744
6      animate      ns  0.3424711 0.13815636 Inf  0.07168965  0.6132526
7   collective      ns -0.8862518 0.20284137 Inf -1.28381353 -0.4886900
8    inanimate      ns -4.1707542 0.41439468 Inf -4.98295287 -3.3585556
9     locative      ns -2.6280841 0.38939857 Inf -3.39129133 -1.8648770
10    temporal      ns -0.7595583 0.30459699 Inf -1.35655743 -0.1625592

(asd <- emmeans(m_final, specs= ~ SPEAKER | POR_ANIMACY)) %>% data.frame

   SPEAKER POR_ANIMACY     emmean         SE  df   asymp.LCL  asymp.UCL
1      nns     animate  0.3687345 0.08351059 Inf  0.20505674  0.5324122
2       ns     animate  0.3424711 0.13815636 Inf  0.07168965  0.6132526
3      nns  collective -1.5984552 0.15136912 Inf -1.89513323 -1.3017772
4       ns  collective -0.8862518 0.20284137 Inf -1.28381353 -0.4886900
5      nns   inanimate -3.8506997 0.20189287 Inf -4.24640242 -3.4549969
6       ns   inanimate -4.1707542 0.41439468 Inf -4.98295287 -3.3585556
7      nns    locative -2.0688740 0.29953757 Inf -2.65595684 -1.4817911
8       ns    locative -2.6280841 0.38939857 Inf -3.39129133 -1.8648770
9      nns    temporal -0.8229836 0.22210061 Inf -1.25829285 -0.3876744
10      ns    temporal -0.7595583 0.30459699 Inf -1.35655743 -0.1625592

… we compute pairwise comparisons, of which the second is probably the much more relevant one:

pairs(qwe,            # pairwise comparisons of means
   adjust="holm") %>% # correct for multiple tests
   data.frame         # show as a data frame

                 contrast SPEAKER   estimate        SE  df     z.ratio
1    animate - collective     nns  1.9671897 0.1728775 Inf  11.3790961
2     animate - inanimate     nns  4.2194342 0.2184828 Inf  19.3124289
3      animate - locative     nns  2.4376085 0.3109610 Inf   7.8389512
4      animate - temporal     nns  1.1917181 0.2372819 Inf   5.0223727
5  collective - inanimate     nns  2.2522445 0.2523358 Inf   8.9255854
6   collective - locative     nns  0.4704188 0.3356119 Inf   1.4016748
7   collective - temporal     nns -0.7754715 0.2687774 Inf  -2.8851813
8    inanimate - locative     nns -1.7818257 0.3612250 Inf  -4.9327311
9    inanimate - temporal     nns -3.0277160 0.3001490 Inf -10.0873773
10    locative - temporal     nns -1.2458903 0.3728960 Inf  -3.3411200
11   animate - collective      ns  1.2287229 0.2454217 Inf   5.0065784
12    animate - inanimate      ns  4.5132254 0.4368182 Inf  10.3320454
13     animate - locative      ns  2.9705553 0.4131809 Inf   7.1894793
14     animate - temporal      ns  1.1020294 0.3344645 Inf   3.2949070
15 collective - inanimate      ns  3.2845025 0.4613757 Inf   7.1189319
16  collective - locative      ns  1.7418324 0.4390625 Inf   3.9671628
17  collective - temporal      ns -0.1266934 0.3659562 Inf  -0.3461984
18   inanimate - locative      ns -1.5426701 0.5686424 Inf  -2.7129001
19   inanimate - temporal      ns -3.4111959 0.5142978 Inf  -6.6327245
20    locative - temporal      ns -1.8685258 0.4943790 Inf  -3.7795415
        p.value
1  4.783296e-29
2  4.222404e-82
3  2.725951e-14
4  2.551851e-06
5  3.103438e-18
6  1.610124e-01
7  7.823766e-03
8  3.243511e-06
9  5.026055e-23
10 2.503234e-03
11 3.324366e-06
12 5.047266e-24
13 5.853499e-12
14 2.953628e-03
15 8.701328e-12
16 3.636665e-04
17 7.291936e-01
18 1.333944e-02
19 2.306826e-10
20 6.284697e-04

pairs(asd,            # pairwise comparisons of means
   adjust="holm") %>% # correct for multiple tests
   data.frame         # show as a data frame

  contrast POR_ANIMACY    estimate        SE  df    z.ratio     p.value
1 nns - ns     animate  0.02626334 0.1614503 Inf  0.1626714 0.870777169
2 nns - ns  collective -0.71220345 0.2347303 Inf -3.0341350 0.002412265
3 nns - ns   inanimate  0.32005456 0.4564186 Inf  0.7012304 0.483159248
4 nns - ns    locative  0.55921015 0.4118245 Inf  1.3578847 0.174500270
5 nns - ns    temporal -0.06342535 0.3690377 Inf -0.1718668 0.863542223

Native and non-native speakers only differ with regard to collectives.

2.5.3 Excursus: predictions that are most wrong

To explore which predictions are the worst – wrong predictions made most confidently – how might one do that? We order the rows of the data frame d by

the genitives: first of, then s;
the predicted probabilities of s-genitives: from high to low.

d <- d[order(d$GENITIVE, -d$PRED_PP_s),]

Then, the head of the data frame will provide of-genitives first, but with very high predicted probabilities of s-genitives and the tail of the data frame will provide s-genitives, but with very low predicted probabilities of s-genitives:

head(d)

     CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
1108 1231       of     nns   spoken          5         23     animate
1097 1220       of     nns   spoken          9         40     animate
1043 1157       of     nns   spoken          4         25  collective
1042 1156       of     nns   spoken          4         24  collective
1044 1158       of     nns   spoken          4         24  collective
3502 3923       of      ns  written          7         26     animate
     POR_FINAL_SIB    POR_DEF POR_LENGTH_LOG PRED_PP_s   PRED_LO_s PRED_CAT
1108        absent indefinite       2.321928 0.8925792 -0.19157730        s
1097        absent   definite       3.169925 0.8893458 -0.95895434        s
1043        absent   definite       2.000000 0.8785117  0.09974479        s
1042        absent   definite       2.000000 0.8709443  0.09974479        s
1044        absent   definite       2.000000 0.8709443  0.09974479        s
3502        absent   definite       2.807355 0.8680890 -0.63085414        s
     LEN_PORmPUM_LOG
1108       -2.201634
1097       -2.152003
1043       -2.643856
1042       -2.584963
1044       -2.584963
3502       -1.893085

tail(d)

     CASE GENITIVE SPEAKER MODALITY POR_LENGTH PUM_LENGTH POR_ANIMACY
994  1105        s     nns   spoken         10          5   inanimate
576   649        s     nns   spoken         19          8   inanimate
2113 2362        s     nns  written         10          4   inanimate
533   601        s     nns   spoken         25          9   inanimate
992  1103        s     nns   spoken         18          4   inanimate
1060 1177        s     nns   spoken         15          3   inanimate
     POR_FINAL_SIB  POR_DEF POR_LENGTH_LOG   PRED_PP_s PRED_LO_s PRED_CAT
994         absent definite       3.321928 0.015880987 -1.096506       of
576        present definite       4.247928 0.014178730 -1.934470       of
2113        absent definite       3.321928 0.013706427 -1.096506       of
533        present definite       4.643856 0.012784419 -2.292758       of
992         absent definite       4.169925 0.009287107 -1.863883       of
1060        absent definite       3.906891 0.008659769 -1.625856       of
     LEN_PORmPUM_LOG
994         1.000000
576         1.247928
2113        1.321928
533         1.473931
992         2.169925
1060        2.321928

2.5.4 Write-up

To determine whether the choice of a genitive construction (of vs. s) varies as a function of

the length difference between logged possessor length and logged possessum length;
the degree/kind of animacy of the possessor (POR_ANIMACY: animate vs. collective vs. inanimate vs. locative vs. temporal);
whether the speakers are non-native speakers or native speakers of English (SPEAKER: nns vs. ns);
any pairwise interaction of these predictors;
the three-way interaction of these predictors,

A generalized linear model selection process was undertaken (using backwards stepwise model selection of all predictors and controls based on significance testing). The final model resulting from the elimination of ns predictors contained the effects of LEN_PORmPUM_LOG:POR_ANIMACY and SPEAKER:POR_ANIMACY and was highly significant (LR=1596.2, df=14, p<0.0001) with a good amount of explanatory/predictive power (McFadden’s R²=0.397, Nagelkerke’s R²=0.534, C=0.899, Cohen’s κ=0.559). The summary table of the model is shown here.

	Estimate	95%-CI	se	z	p_2-tailed
(Intercept (LEN_PORmPUM_LOG=0, POR_ANIMACY=animate, SPEAKER=nns)	0.64	[0.47, 0.82]	0.09	7.21	<0.001
LEN_PORmPUM_LOG_0→1	-0.67	[-0.81, -0.54]	0.07	-9.58	<0.001
POR_ANIMACY_{animate→collective}	-1.76	[-2.08, -1.46]	0.16	-11.20	<0.001
POR_ANIMACY_{animate→inanimate}	-4.30	[-4.74, -3.90]	0.21	-20.04	<0.001
POR_ANIMACY_{animate→locative}	-2.30	[-2.84, -1.81]	0.26	-8.79	<0.001
POR_ANIMACY_{animate→temporal}	-1.32	[-1.77, -0.88]	0.23	-5.82	<0.001
SPEAKER_nns→ns	-0.03	[-0.34, 0.29]	0.16	-0.16	0.8708
POR_ANIMACY_{animate→collective}:LEN_PORmPUM_LOG_0→1	-0.50	[-0.76, -0.26]	0.13	-3.91	0.001
POR_ANIMACY_{animate→inanimate}:LEN_PORmPUM_LOG_0→1	0.21	[-0.10, 0.52]	0.16	1.30	0.1949
POR_ANIMACY_{animate→locative}:LEN_PORmPUM_LOG_0→1	-0.34	[-0.80, 0.08]	0.22	-1.53	0.1249
POR_ANIMACY_{animate→temporal}:LEN_PORmPUM_LOG_0→1	0.31	[-0.01, 0.62]	0.16	1.95	0.051
POR_ANIMACY_{animate→collective}:SPEAKER_nns→ns	0.74	[0.18, 1.30]	0.28	2.59	0.0095
POR_ANIMACY_{animate→inanimate}:SPEAKER_nns→ns	-0.29	[-1.33, 0.60]	0.48	-0.61	0.544
POR_ANIMACY_{animate→locative}:SPEAKER_nns→ns	-0.53	[-1.43, 0.31]	0.44	-1.20	0.2283
POR_ANIMACY_{animate→temporal}:SPEAKER_nns→ns	0.09	[-0.71, 0.88]	0.40	0.22	0.8238

The results indicate that

the interaction LEN_PORmPUM_LOG:POR_ANIMACY has the effect that
- the length difference effect is strongest for collective possessors, followed by locatives, then animates, then inanimates, then temporals, but all of them are significantly different from 0;
- the only differences between slopes for different animacy levels that are significant (acc. to post-hoc tests adjusted for multiple testing with Holm’s procedure) are the following:
  - collective vs. animate;
  - collective vs. inanimate;
  - collective vs. temporal;
the interaction SPEAKER:POR_ANIMACY only has the effect that native speakers are significantly more likely to use s-genitives with collectives than learners are – all other speaker differences within each animacy level are ns. [Show plots]

Finally, the model did particularly badly with length difference values close to 0 or positive and/or inanimate possessors, but also, to a lesser extent, with non-native speakers.

Quiz question: I wrote: “the length difference effect is strongest for collective possessors, followed by locatives, then animates, then inanimates, then temporals, but all of them are significantly different from 0” – how do I know that?!

2.6 Subject realization in Japanese

Imagine you have a binary variable such as the whether native and non-native speakers of Japanese realize subjects in sentences in their conversations; the data we will work with are in _input/subjreal.csv, you can find information about the variables/columns in _input/subjreal.r, and our response variable is called SUBJREAL:

rm(list=ls(all=TRUE))
summary(d <- read.delim(       # the structure of d, the result of loading
   file="_input/subjreal.csv", # this file
   stringsAsFactors=TRUE)) # change categorical variables into factors

      CASE        SPKR          SPEAKERID     SUBJREAL    CONTRAST   
 Min.   :    1   njs:7371   26-JE_NNS:  821   no  :5016   no  :6393  
 1st Qu.: 3814   nns:7394   19-JC_NNS:  769   yes :1649   yes : 271  
 Median : 7634              18-JK_NJS:  760   NA's:8100   NA's:8101  
 Mean   : 7630              7-JE_NJS :  731                          
 3rd Qu.:11443              24-JE_NNS:  710                          
 Max.   :15265              1-JC_NJS :  705                          
                            (Other)  :10269                          
   GIVENNESS     
 Min.   : 0.000  
 1st Qu.: 8.000  
 Median :10.000  
 Mean   : 7.875  
 3rd Qu.:10.000  
 Max.   :10.000  
 NA's   :8990

Now, we will deal with this in a multifactorial way. Specifically, we are asking, does the choice of subject realization (no vs. yes) vary as a function of

the degree of givenness of the referent of the subject on a scale from 0 (‘completely new’) to 10 (‘talked about just now’);
whether the subject would be contrastive: no vs. yes;
whether the speakers are non-native speakers or native speakers of Japanese (SPKR: njs vs. nns);
any pairwise interaction of these predictors;
the three-way interaction of these predictors?

We first reduce the data set to the complete cases:

summary(d <- droplevels(d[complete.cases(d),]))

      CASE        SPKR          SPEAKERID    SUBJREAL   CONTRAST  
 Min.   :    1   njs:3176   18-JK_NJS: 386   no :4193   no :5519  
 1st Qu.: 3562   nns:2599   16-JE_NJS: 359   yes:1582   yes: 256  
 Median : 7054              8-JE_NJS : 318                        
 Mean   : 7220              25-JE_NJS: 300                        
 3rd Qu.:10806              1-JC_NJS : 299                        
 Max.   :15259              19-JC_NNS: 290                        
                            (Other)  :3823                        
   GIVENNESS     
 Min.   : 0.000  
 1st Qu.: 8.000  
 Median :10.000  
 Mean   : 7.875  
 3rd Qu.:10.000  
 Max.   :10.000

Quick reminder:

(baseline <- max(          # make baseline the highest
   prop.table(             # proportion in the
      table(d$SUBJREAL)))) # frequency table of the response

[1] 0.7260606

deviance(m_00 <- glm( # make/summarize the gen. linear model m_00:
   SUBJREAL ~ 1,      # SUBJREAL ~ an overall intercept (1) & no predictor
   family=binomial,   # the response is binary
   data=d))           # those variables are in d

[1] 6781.442

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
hist(d$GIVENNESS, main="")
table(d$POR_ANIMACY)

< table of extent 0 >

table(d$SPEAKER)


 1-JC_NJS  1-JC_NNS 10-JE_NJS 10-JE_NNS 11-JE_NJS 11-JE_NNS 16-JE_NJS 16-JE_NNS 
      299       266       247       236       195       250       359       189 
18-JK_NJS 18-JK_NNS 19-JC_NJS 19-JC_NNS  2-JK_NJS  2-JK_NNS 24-JE_NJS 24-JE_NNS 
      386        77       160       290       222       278       234       173 
25-JE_NJS 25-JE_NNS 26-JE_NJS 26-JE_NNS  7-JE_NJS  7-JE_NNS  8-JE_NJS  8-JE_NNS 
      300       220       179       248       277       188       318       184

# the predictor(s) w/ the response
spineplot(d$SUBJREAL ~ d$GIVENNESS)
table(d$CONTRAST, d$SUBJREAL)

     
        no  yes
  no  4151 1368
  yes   42  214

table(d$SPKR, d$SUBJREAL)

     
        no  yes
  njs 2367  809
  nns 1826  773

ftable(d$CONTRAST, d$SPKR, d$SUBJREAL)

           no  yes
                  
no  njs  2344  716
    nns  1807  652
yes njs    23   93
    nns    19  121

Figure 56: Exploration for the subject realization data

Figure 57: Exploration for the subject realization data

2.6.1 Modeling & numerical evaluation

Let’s fit our initial regression model:

summary(m_01 <- glm(      # make/summarize the gen. linear model m_01:
   SUBJREAL ~ 1 +         # SUBJREAL ~ an overall intercept (1)
   GIVENNESS*CONTRAST*SPKR, # & these predictors & their interaction
   family=binomial,       # resp = binary
   data=d))               # those vars are in d


Call:
glm(formula = SUBJREAL ~ 1 + GIVENNESS * CONTRAST * SPKR, family = binomial, 
    data = d)

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    1.184249   0.102591  11.543  < 2e-16 ***
GIVENNESS                     -0.323963   0.012433 -26.057  < 2e-16 ***
CONTRASTyes                    0.114811   0.362928   0.316    0.752    
SPKRnns                        0.109794   0.152180   0.721    0.471    
GIVENNESS:CONTRASTyes          0.342195   0.050759   6.742 1.57e-11 ***
GIVENNESS:SPKRnns              0.007453   0.018296   0.407    0.684    
CONTRASTyes:SPKRnns            0.587306   0.541679   1.084    0.278    
GIVENNESS:CONTRASTyes:SPKRnns -0.053055   0.075272  -0.705    0.481    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6781.4  on 5774  degrees of freedom
Residual deviance: 4851.2  on 5767  degrees of freedom
AIC: 4867.2

Number of Fisher Scoring iterations: 4

This looks like we need to simplify things:

drop1(m_01, # drop each droppable predictor at a time from m_01 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                        Df Deviance      AIC       LRT  Pr..Chi.
<none>                  NA 4851.219 4867.219        NA        NA
GIVENNESS:CONTRAST:SPKR  1 4851.717 4865.717 0.4980584 0.4803545

Thus, we delete the predictor with the highest non-significant p-value and fit the 2nd model with that 3-way interaction GIVENNESS:CONTRAST:SPKR deleted:

summary(m_02 <- update( # make m_02 an update of
   m_01, .~.            # m_01, namely all of it (.~.), but then
   - GIVENNESS:CONTRAST:SPKR)) # remove this interaction


Call:
glm(formula = SUBJREAL ~ GIVENNESS + CONTRAST + SPKR + GIVENNESS:CONTRAST + 
    GIVENNESS:SPKR + CONTRAST:SPKR, family = binomial, data = d)

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)            1.173926   0.101287  11.590   <2e-16 ***
GIVENNESS             -0.322515   0.012234 -26.363   <2e-16 ***
CONTRASTyes            0.248803   0.319542   0.779    0.436    
SPKRnns                0.132500   0.148841   0.890    0.373    
GIVENNESS:CONTRASTyes  0.317882   0.037457   8.487   <2e-16 ***
GIVENNESS:SPKRnns      0.004307   0.017754   0.243    0.808    
CONTRASTyes:SPKRnns    0.297779   0.349738   0.851    0.395    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6781.4  on 5774  degrees of freedom
Residual deviance: 4851.7  on 5768  degrees of freedom
AIC: 4865.7

Number of Fisher Scoring iterations: 4

anova(m_01, m_02,               # compare m_01 to m_02
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df   Deviance  Pr..Chi.
1      5767   4851.219 NA         NA        NA
2      5768   4851.717 -1 -0.4980584 0.4803545

drop1(m_02, # drop each droppable predictor at a time from m_02 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                   Df Deviance      AIC         LRT     Pr..Chi.
<none>             NA 4851.717 4865.717          NA           NA
GIVENNESS:CONTRAST  1 4910.457 4922.457 58.73974047 1.799695e-14
GIVENNESS:SPKR      1 4851.776 4863.776  0.05881266 8.083825e-01
CONTRAST:SPKR       1 4852.445 4864.445  0.72825511 3.934494e-01

Let’s fit the 3rd model with that interaction GIVENNESS:SPKR deleted:

summary(m_03 <- update( # make m_03 an update of
   m_02, .~.            # m_02, namely all of it (.~.), but then
   - GIVENNESS:SPKR))   # remove this interaction


Call:
glm(formula = SUBJREAL ~ GIVENNESS + CONTRAST + SPKR + GIVENNESS:CONTRAST + 
    CONTRAST:SPKR, family = binomial, data = d)

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)            1.159897   0.082884  13.994   <2e-16 ***
GIVENNESS             -0.320545   0.009124 -35.133   <2e-16 ***
CONTRASTyes            0.251807   0.318626   0.790   0.4294    
SPKRnns                0.163595   0.075717   2.161   0.0307 *  
GIVENNESS:CONTRASTyes  0.317899   0.037449   8.489   <2e-16 ***
CONTRASTyes:SPKRnns    0.289544   0.347981   0.832   0.4054    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6781.4  on 5774  degrees of freedom
Residual deviance: 4851.8  on 5769  degrees of freedom
AIC: 4863.8

Number of Fisher Scoring iterations: 4

anova(m_02, m_03,               # compare m_02 to m_03
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df    Deviance  Pr..Chi.
1      5768   4851.717 NA          NA        NA
2      5769   4851.776 -1 -0.05881266 0.8083825

drop1(m_03, # drop each droppable predictor at a time from m_03 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                   Df Deviance      AIC        LRT     Pr..Chi.
<none>             NA 4851.776 4863.776         NA           NA
GIVENNESS:CONTRAST  1 4910.533 4920.533 58.7577749 1.783275e-14
CONTRAST:SPKR       1 4852.471 4862.471  0.6953493 4.043508e-01

Let’s fit the 4th model with that interaction CONTRAST:SPKR deleted:

summary(m_04 <- update( # make m_04 an update of
   m_03, .~.            # m_03, namely all of it (.~.), but then
   - CONTRAST:SPKR))    # remove this interaction


Call:
glm(formula = SUBJREAL ~ GIVENNESS + CONTRAST + SPKR + GIVENNESS:CONTRAST, 
    family = binomial, data = d)

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)            1.153805   0.082515  13.983   <2e-16 ***
GIVENNESS             -0.320586   0.009126 -35.131   <2e-16 ***
CONTRASTyes            0.401186   0.267828   1.498   0.1342    
SPKRnns                0.177381   0.073887   2.401   0.0164 *  
GIVENNESS:CONTRASTyes  0.316619   0.037359   8.475   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6781.4  on 5774  degrees of freedom
Residual deviance: 4852.5  on 5770  degrees of freedom
AIC: 4862.5

Number of Fisher Scoring iterations: 4

anova(m_03, m_04,               # compare m_03 to m_04
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df   Deviance  Pr..Chi.
1      5769   4851.776 NA         NA        NA
2      5770   4852.471 -1 -0.6953493 0.4043508

drop1(m_04, # drop each droppable predictor at a time from m_04 &
   test="Chisq") %>% data.frame  # test its significance w/ a LRT

                   Df Deviance      AIC       LRT     Pr..Chi.
<none>             NA 4852.471 4862.471        NA           NA
SPKR                1 4858.229 4866.229  5.757902 1.641466e-02
GIVENNESS:CONTRAST  1 4910.914 4918.914 58.442790 2.092892e-14

We’re done:

Confint(m_final <- m_04); rm(m_04); invisible(gc())

                        Estimate       2.5 %     97.5 %
(Intercept)            1.1538051  0.99373618  1.3173148
GIVENNESS             -0.3205855 -0.33865477 -0.3028731
CONTRASTyes            0.4011863 -0.10095012  0.9545382
SPKRnns                0.1773806  0.03251689  0.3222221
GIVENNESS:CONTRASTyes  0.3166187  0.24257305  0.3897207

Let’s compute the overall significance test for this model:

anova(m_00, m_final, # compare m_00 to m_final
   test="Chisq") %>% data.frame # using a LRT

  Resid..Df Resid..Dev Df Deviance Pr..Chi.
1      5774   6781.442 NA       NA       NA
2      5770   4852.471  4 1928.971        0

pchisq(              # compute the area under the chi-squared curve
   q=6781.4-4852.5,  # for this chi-squared value
   df=4,             # at 4 df
   lower.tail=FALSE) # only using the right/upper tail/side

[1] 0

Let’s make ‘predictions’ and evaluate them:

d$PRED_PP_y <- predict( # make d$PRED_PP_y the result of predicting
   m_final,             # from m_final
   type="response")     # predicted probabilities of "yes"
# if you don't say type="response", predict gives you log odds
d$PRED_CAT <- factor(         # make d$PRED_CAT the result of checking
   ifelse(d$PRED_PP_y>=0.5,   # if the predicted prob. of "yes" is >=0.5
      levels(d$SUBJREAL)[2],  # if yes, predict "yes"
      levels(d$SUBJREAL)[1]), # otherwise, predict "no"
   levels=levels(d$SUBJREAL)) # make sure both levels are registered
(c_m <- table(          # confusion matrix: cross-tabulate
   "OBS"  =d$SUBJREAL,  # observed choices in the rows
   "PREDS"=d$PRED_CAT)) # predicted orders in the columns

     PREDS
OBS     no  yes
  no  3920  273
  yes  629  953

c( # evaluate the confusion matrix
   "Prec. for yes"    =c_m["yes", "yes"] / sum(c_m[     , "yes"]),
   "Acc./rec. for yes"=c_m["yes", "yes"] / sum(c_m["yes",      ]),
   "Prec. for no"     =c_m[ "no","no"]   / sum(c_m[     ,"no"  ]),
   "Acc./rec. for no" =c_m[ "no","no"]   / sum(c_m[ "no",      ]),
   "Acc. (overall)"  =mean(d$SUBJREAL==d$PRED_CAT))

    Prec. for yes Acc./rec. for yes      Prec. for no  Acc./rec. for no 
        0.7773246         0.6024020         0.8617279         0.9348915 
   Acc. (overall) 
        0.8438095

c(R2s(m_final),
  "Cohen's kappa"=cohens.kappa(c_m)[[1]],
  "C"=C.score(d$SUBJREAL, d$PRED_PP_y))

  McFadden R-squared Nagelkerke R-squared        Cohen's kappa 
           0.2844485            0.4109678            0.5777748 
                   C 
           0.8028431

This looks like a pretty decent model, not bad at all (esp. Cohen’s kappa and the C-score).

2.6.2 Interpretation

2.6.2.1 The main effect `SPKR`

Let’s visualize the nature of the effect of SPKR based on the predictions from effects:

(sp_d <- data.frame( # make sp_d a data frame of
   sp <- effect(     # the effect
      "SPKR",        # of SPKR
      m_final)))     # in m_final

  SPKR       fit          se     lower     upper
1  njs 0.2239942 0.008865779 0.2070979 0.2418486
2  nns 0.2563248 0.010362191 0.2365475 0.2771554

# alternative w/ predictorEffect:
# predictorEffect( # the effect
#    "SPKR",       # of SPKR
#    m_final)      # in m_final
plot(sp,            # plot the effect of SPKR
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

Figure 58: The effect of SPKR in m_final

2.6.2.2 The interaction `GIVENNESS:CONTRAST`

Let’s visualize the nature of the effect of GIVENNESS:CONTRAST based on the predictions:

(gico_d <- data.frame(              # make gico_d a data frame of
   gico <- effect(                  # the effect
      "GIVENNESS:CONTRAST",         # of GIVENNESS:CONTRAST
      xlevels=list(GIVENNESS=0:10), # w/ these values for GIVENNESS
      m_final)))                    # in m_final

   GIVENNESS CONTRAST       fit          se     lower     upper
1          0       no 0.7744540 0.013242416 0.7474477 0.7993477
2          1       no 0.7136236 0.013907147 0.6856094 0.7400878
3          2       no 0.6439301 0.013910734 0.6162240 0.6707048
4          3       no 0.5675538 0.013211328 0.5414985 0.5932406
5          4       no 0.4878254 0.011931698 0.4644833 0.5112207
6          5       no 0.4087117 0.010338719 0.3886138 0.4291195
7          6       no 0.3340599 0.008748289 0.3171384 0.3514197
8          7       no 0.2668889 0.007396797 0.2526440 0.2816343
9          8       no 0.2089857 0.006356385 0.1968004 0.2217172
10         9       no 0.1608880 0.005557064 0.1502920 0.1720798
11        10       no 0.1221505 0.004886624 0.1128915 0.1320558
12         0      yes 0.8368289 0.035072783 0.7560955 0.8945656
13         1      yes 0.8362865 0.031599462 0.7646761 0.8892613
14         2      yes 0.8357427 0.028528717 0.7719877 0.8843418
15         3      yes 0.8351974 0.026018512 0.7777201 0.8801035
16         4      yes 0.8346507 0.024259746 0.7814989 0.8769094
17         5      yes 0.8341025 0.023439157 0.7829586 0.8751170
18         6      yes 0.8335529 0.023671390 0.7818743 0.8749460
19         7      yes 0.8330018 0.024943101 0.7782714 0.8763688
20         8      yes 0.8324492 0.027123270 0.7724056 0.8791310
21         9      yes 0.8318952 0.030028119 0.7646364 0.8828778
22        10      yes 0.8313397 0.033481718 0.7553092 0.8872718

# alternative w/ predictorEffect:
# predictorEffect(      # the effect
#    "GIVENNESS",       # of GIVENNESS:CONTRAST
#    m_final,           # in m_final
#    focal.levels=0:10) # w/ these values for GIVENNESS
plot(gico,          # plot the effect of GIVENNESS:CONTRAST
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

Figure 59: The effect of GIVENNESS:CONTRAST in m_final

This might be better with a one-panel kind of plot:

plot(gico,                      # plot the effect of GIVENNESS:CONTRAST
   type="response",             # but plot predicted probs
   ylim=c(0, 1), grid=TRUE,     # w/ these y-axis limits & w/ a grid
   multiline=TRUE,              # put all lines in one panel
   confint=list(style="auto"),  # keep confidence intervals
   lattice=list(key.args=list(  # make the
      columns=2, cex.title=0))) # legend have 5 columns & no title

Figure 60: The effect of GIVENNESS:CONTRAST in m_final

Quiz question: Is the slope of GIVENNESS when CONTRAST is yes significantly different from 0? In other words, when CONTRAST is yes, is there still a significant effect of GIVENNESS?

2.6.3 Write-up

This one’s for you >:)

sessionInfo()

R version 4.5.1 Patched (2025-06-20 r88332 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  compiler  methods  
[8] base     

other attached packages:
 [1] ggplot2_3.5.2   multcomp_1.4-28 TH.data_1.1-3   MASS_7.3-65    
 [5] survival_3.8-3  mvtnorm_1.3-3   emmeans_1.11.2  effects_4.2-4  
 [9] car_3.1-3       carData_3.0-5   STGmisc_1.01    Rcpp_1.1.0     
[13] magrittr_2.0.3 

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       xfun_0.52          htmlwidgets_1.6.4  insight_1.3.1     
 [5] lattice_0.22-7     crosstalk_1.2.1    vctrs_0.6.5        tools_4.5.1       
 [9] Rdpack_2.6.4       generics_0.1.4     sandwich_3.1-1     tibble_3.3.0      
[13] pkgconfig_2.0.3    Matrix_1.7-3       data.table_1.17.8  RColorBrewer_1.1-3
[17] lifecycle_1.0.4    farver_2.1.2       mitools_2.4        codetools_0.2-20  
[21] survey_4.4-2       htmltools_0.5.8.1  lazyeval_0.2.2     yaml_2.3.10       
[25] Formula_1.2-5      plotly_4.11.0      tidyr_1.3.1        pillar_1.11.0     
[29] nloptr_2.2.1       reformulas_0.4.1   boot_1.3-31        abind_1.4-8       
[33] nlme_3.1-168       tidyselect_1.2.1   digest_0.6.37      purrr_1.1.0       
[37] dplyr_1.1.4        splines_4.5.1      fastmap_1.2.0      grid_4.5.1        
[41] colorspace_2.1-1   cli_3.6.5          dichromat_2.0-0.1  withr_3.0.2       
[45] scales_1.4.0       estimability_1.5.1 httr_1.4.7         rmarkdown_2.29    
[49] nnet_7.3-20        lme4_1.1-37        zoo_1.8-14         coda_0.19-4.1     
[53] evaluate_1.0.4     knitr_1.50         rbibutils_2.3      viridisLite_0.4.2 
[57] rlang_1.1.6        xtable_1.8-4       glue_1.8.0         DBI_1.2.3         
[61] rstudioapi_0.17.1  minqa_1.2.8        jsonlite_2.0.0     R6_2.6.1

1 Linear modeling

1.1 Introduction

1.2 A binary predictor

1.2.1 Modeling & numerical interpretation

1.2.2 Visual interpretation

1.2.3 Write-up

1.2.4 Diagnostics 1

1.3 A categorical predictor

1.3.1 Modeling & numerical interpretation

1.3.2 Visual interpretation

1.3.3 Excursus 1: the difference

1.3.4 Excursus 2: is it significant?

1.3.4.1 Conflation via model comparison

1.3.4.2 Post-hoc comparisons

1.3.4.3 Multiple post-hoc tests

1.3.5 Write-up

1.4 A numeric predictor

1.4.1 Modeling & numerical interpretation

1.4.2 Visual interpretation

1.4.3 Write-up

1.4.4 Just to really drive this home …

1.5 Two categorical predictors

1.5.1 Modeling & numerical interpretation

1.5.2 Visual interpretation

1.5.3 Modeling with interactions

1.5.4 Visual interpretation

1.5.5 Write-up

1.6 A numeric and a categorical predictor

1.6.1 Modeling & numerical interpretation

1.6.2 Visual interpretation

1.6.3 Modeling with interactions

1.6.4 Visual interpretation

1.6.5 Write-up

1.7 Two numeric predictors

1.7.1 Modeling & numerical interpretation

1.7.2 Visual interpretation

1.7.3 Modeling with interactions

1.7.4 Visual interpretation

1.7.5 Write-up

2 Binary logistic regression modeling

2.1 Introduction

2.2 A binary predictor

2.2.1 Modeling & numerical interpretation

2.2.2 Visual interpretation

2.2.3 Diagnostics 2

2.2.4 Write-up

2.3 A categorical predictor

2.3.1 Modeling & numerical interpretation

2.3.2 Visual interpretation

2.3.3 Excursus: conflation via model comparison

2.3.4 Write-up

2.4 A numeric predictor

2.4.1 Modeling & numerical interpretation

2.4.2 Visual interpretation

2.4.3 Write-up

2.5 A backwards model selection process

2.5.1 Modeling & numerical evaluation

2.5.2 Interpretation

2.5.2.1 The interaction LEN_PORmPUM_LOG:POR_ANIMACY

2.5.2.2 The interaction POR_ANIMACY:SPEAKER

2.5.3 Excursus: predictions that are most wrong

2.5.4 Write-up

2.6 Subject realization in Japanese

2.6.1 Modeling & numerical evaluation

2.6.2 Interpretation

2.6.2.1 The main effect SPKR

2.6.2.2 The interaction GIVENNESS:CONTRAST

2.6.3 Write-up

2.5.2.1 The interaction `LEN_PORmPUM_LOG:POR_ANIMACY`

2.5.2.2 The interaction `POR_ANIMACY:SPEAKER`

2.6.2.1 The main effect `SPKR`

2.6.2.2 The interaction `GIVENNESS:CONTRAST`