Ling 204: session 02

Author

Affiliations

Stefan Th. Gries

UC Santa Barbara

JLU Giessen

Published

14 Jan 2026 11-34-56

1 Session 02: Coefficient comparisons 1

1.1 Introduction: the ‘incompleteness’ of regression summaries

In this session, we will look at how to get more detail out of models and their summaries. The first part, Section 1.2, is concerned with post-hoc comparisons while the second part, Section 1.3, deals with a priori motivated comparisons (which often are not even straightforwardly approachable from an existing model summary). But as an introduction, let’s consider the issue of how much of what might be interesting a regression summary actually does (not) show, for which we briefly consider the following linear regression model on the duration data using just two predictors:

SURPRISAL_s, a square-root transformed surprisal value;
CLASS: a categorical predictor categorizing words in terms of 5 different levels:

rm(list=ls(all.names=TRUE)); pkgs2load <- c("car", "effects", "emmeans", "multcomp")
suppressMessages(sapply(pkgs2load, library, character.only=TRUE, logical.return=TRUE, quietly=TRUE)); rm(pkgs2load)

     car  effects  emmeans multcomp
    TRUE     TRUE     TRUE     TRUE

d <- read.delim("_input/wdurations.csv", stringsAsFactors=TRUE)
d$DURATION_l <- log2(d$DURATION); d$SURPRISAL_s <- sqrt(d$SURPRISAL); summary(d[,c(12:13,10)])

   DURATION_l      SURPRISAL_s         CLASS
 Min.   : 3.807   Min.   :0.000   closed  :4021
 1st Qu.: 6.781   1st Qu.:2.000   number  : 125
 Median : 7.340   Median :2.449   open    :2086
 Mean   : 7.348   Mean   :2.371   propname: 110
 3rd Qu.: 7.937   3rd Qu.:2.828   unclear :  41
 Max.   :10.456   Max.   :4.583

m <- lm(DURATION_l ~ 1 + SURPRISAL_s*CLASS, data=d)

Consider all the questions that one might have with regard to just the interaction of SURPRISAL_s:CLASS. Once you think about it for a moment, I hope you realize that the questions one might have are potentially all of these:

The slope of SURPRISAL_s when CLASS is closed,
1. what is it?
2. is it significantly different from 0?
3. is it significantly different from the slope of SURPRISAL_s when CLASS is number?
4. is it significantly different from the slope of SURPRISAL_s when CLASS is open?
5. is it significantly different from the slope of SURPRISAL_s when CLASS is propname?
6. is it significantly different from the slope of SURPRISAL_s when CLASS is unclear?
The slope of SURPRISAL_s when CLASS is number,
1. what is it?
2. is it significantly different from 0?
3. is it significantly different from the slope of SURPRISAL_s when CLASS is open?
4. is it significantly different from the slope of SURPRISAL_s when CLASS is propname?
5. is it significantly different from the slope of SURPRISAL_s when CLASS is unclear?
The slope of SURPRISAL_s when CLASS is open,
1. what is it?
2. is it significantly different from 0?
3. is it significantly different from the slope of SURPRISAL_s when CLASS is propname?
4. is it significantly different from the slope of SURPRISAL_s when CLASS is unclear?
The slope of SURPRISAL_s when CLASS is propname,
1. what is it?
2. is it significantly different from 0?
3. is it significantly different from the slope of SURPRISAL_s when CLASS is unclear?
The slope of SURPRISAL_s when CLASS is unclear,
1. what is it?
2. is it significantly different from 0?

Again, these are just the most basic questions that would probably be of interest – I am leaving out more specialized ones based on conflations and possibly interesting tests for specific values.

But now consider again what most people provide, which, if we’re lucky, is this:

summary(m)$coef %>% round(4)

                          Estimate Std. Error  t value Pr(>|t|)
(Intercept)                 6.1698     0.0371 166.4236   0.0000
SURPRISAL_s                 0.3927     0.0170  23.1424   0.0000
CLASSnumber                 1.5894     0.3608   4.4049   0.0000
CLASSopen                   0.7045     0.0759   9.2811   0.0000
CLASSpropname               2.1682     0.3820   5.6761   0.0000
CLASSunclear                2.5707     0.7265   3.5385   0.0004
SURPRISAL_s:CLASSnumber    -0.3263     0.1288  -2.5333   0.0113
SURPRISAL_s:CLASSopen      -0.0188     0.0285  -0.6590   0.5099
SURPRISAL_s:CLASSpropname  -0.3391     0.1092  -3.1051   0.0019
SURPRISAL_s:CLASSunclear   -0.7174     0.2595  -2.7652   0.0057

plot(allEffects(m), grid=TRUE)

You see there’s a big discrepancy. Which of the above questions does this output answer (directly or after one addition)?

when CLASS is closed, this answers all 6 questions (a to f);
when CLASS is number, this answers 1 out of 5 questions (a);
when CLASS is open, this answers 1 out of 4 questions (a);
when CLASS is propname, this answers 1 out of 3 questions (a);
when CLASS is unclear, this answers 1 out of 2 questions (a).

Not a great result: we get 10 answers for minimally 20 questions and for many of the as yet unanswered questions you may not even know yet how to get them – this session and the next will change this!

We begin by re-loading and re-preparing the same duration data; this time we’re centering the surprisal predictor:

rm(list=ls(all.names=TRUE))
d <- read.delim("_input/wdurations.csv", stringsAsFactors=TRUE)
d$DURATION_l <- log2(d$DURATION)
d$SURPRISAL_sc <- sqrt(d$SURPRISAL) - mean(sqrt(d$SURPRISAL))

We fit a first model:

(m_01 <- lm(data=d, formula=DURATION_l ~ 1 +
   SURPRISAL_sc + VOWELS + CLASS +
   SURPRISAL_sc:VOWELS + SURPRISAL_sc:CLASS + VOWELS:CLASS)) %>% { list(
      "Summary"=summary(.),
       "Drop 1"=drop1(., test="F") %>% as.data.frame) }

$Summary

Call:
lm(formula = DURATION_l ~ 1 + SURPRISAL_sc + VOWELS + CLASS +
    SURPRISAL_sc:VOWELS + SURPRISAL_sc:CLASS + VOWELS:CLASS,
    data = d)

Residuals:
     Min       1Q   Median       3Q      Max
-2.91115 -0.42599  0.00746  0.43914  2.76733

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)                 7.14047    0.01348 529.889  < 2e-16 ***
SURPRISAL_sc                0.44108    0.01883  23.430  < 2e-16 ***
VOWELStwo+                 -0.15738    0.02644  -5.951 2.80e-09 ***
CLASSnumber                 0.76686    0.08667   8.848  < 2e-16 ***
CLASSopen                   0.50264    0.03821  13.155  < 2e-16 ***
CLASSpropname               1.30291    0.13842   9.413  < 2e-16 ***
CLASSunclear                0.82048    0.15264   5.375 7.92e-08 ***
SURPRISAL_sc:VOWELStwo+    -0.18325    0.03126  -5.862 4.81e-09 ***
SURPRISAL_sc:CLASSnumber   -0.32454    0.12828  -2.530 0.011434 *
SURPRISAL_sc:CLASSopen      0.07632    0.03281   2.326 0.020038 *
SURPRISAL_sc:CLASSpropname -0.36327    0.10886  -3.337 0.000852 ***
SURPRISAL_sc:CLASSunclear  -0.73630    0.26011  -2.831 0.004659 **
VOWELStwo+:CLASSnumber      0.18223    0.15189   1.200 0.230279
VOWELStwo+:CLASSopen        0.30963    0.04931   6.279 3.63e-10 ***
VOWELStwo+:CLASSpropname    0.31989    0.19595   1.633 0.102619
VOWELStwo+:CLASSunclear     0.23582    0.30537   0.772 0.440005
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6822 on 6367 degrees of freedom
Multiple R-squared:  0.3818,    Adjusted R-squared:  0.3803
F-statistic: 262.1 on 15 and 6367 DF,  p-value: < 2.2e-16


$`Drop 1`
                    Df Sum of Sq      RSS       AIC   F value       Pr(>F)
<none>              NA        NA 2963.259 -4865.987        NA           NA
SURPRISAL_sc:VOWELS  1  15.99190 2979.250 -4833.632 34.360969 4.807334e-09
SURPRISAL_sc:CLASS   4  15.18287 2978.441 -4841.366  8.155661 1.480500e-06
VOWELS:CLASS         4  18.60356 2981.862 -4834.039  9.993128 4.640439e-08

In the interest of covering everything – all 20 questions from above and more – in excruciating detail, we’ll go over nine kinds of scenarios (yes, nine, you can thank me later …) which arise from the combination of two three-possibility scenarios:

Table 1: 9 scenarios relevant for post-hoc exploration of models

	a: what are the means/slopes?	b: are they * diff from 0?	c: are they * diff from each other?
1: means for (combos of) levels of cat preds	scenario 1a	scenario 1b	scenario 1c
2: means for values of numeric preds	scenario 2a	scenario 2b	scenario 2c
3: slopes for (combos of) levels of cat preds	scenario 3a	scenario 3b	scenario 3c

Yes, this seems annoying but each of the three questions in the columns has potentially important implications, and the way they are tackled depends on what one looks at: means, slopes, etc. And, maybe somewhat confusingly once you recognize it, none of these is fully answered by the summary output and hardly any study I see discusses all three.

1.2 Post-hoc comparisons

1.2.1 Scen. 1a: means for cat

Scenario 1a means to ask the following question: When SURPRISAL_sc is controlled for, what is each of the 10 predicted means of DURATION_l for when you cross the 2 levels of VOWELS and the 5 levels of CLASS? A simple way to answer all scenario 1 questions and, maybe more generally, to make sense of especially the coefficients of models that I often use is useful here as well: We create a data frame nd (for ‘new data’) that contains all variables in the rhs of the model formula and crosses

all levels of all categorical variables in the rhs (here VOWELS and CLASS);
the values of 0 and 1 for all numeric variables in the rhs and when the numeric predictors are not centered, I usually also include the mean value for each numeric variable:

nd <- expand.grid(              # generate all combinations of
   SURPRISAL_sc=0:1,            # main values of SURPRISAL_sc
   CLASS=levels(d$CLASS),       # all levels of CLASS
   VOWELS=levels(d$VOWELS)) %>% # all levels of VOWELS, then
   "["(,3:1)                    # reorder columns

To that data frame, we then add the predictions the final model makes for all these combinations and, when easily available, also their confidence intervals; we can then just look at the whole data frame or create different tabular versions of it – here’s one I find useful (because the rows let us see the predictions for SURPRISAL_sc 0 and 1 well, the difference of which is the slope of SURPRISAL_sc):

nd <- cbind(nd, PREDS=predict(
   m_01, newdata=nd, interval="confidence"))
colnames(nd)[4:6] <- c("fit", "lwr", "upr")
nd

   VOWELS    CLASS SURPRISAL_sc      fit      lwr      upr
1     one   closed            0 7.140473 7.114057 7.166890
2     one   closed            1 7.581558 7.528438 7.634679
3     one   number            0 7.907337 7.739671 8.075002
4     one   number            1 8.023885 7.821621 8.226149
5     one     open            0 7.643108 7.574309 7.711908
6     one     open            1 8.160513 8.085945 8.235081
7     one propname            0 8.443379 8.173478 8.713281
8     one propname            1 8.521196 8.382297 8.660096
9     one  unclear            0 7.960958 7.662950 8.258965
10    one  unclear            1 7.665738 7.278268 8.053208
11   two+   closed            0 6.983092 6.937840 7.028343
12   two+   closed            1 7.240932 7.157581 7.324282
13   two+   number            0 7.932180 7.655043 8.209317
14   two+   number            1 7.865484 7.567353 8.163614
15   two+     open            0 7.795353 7.755931 7.834775
16   two+     open            1 8.129512 8.088166 8.170859
17   two+ propname            0 8.605884 8.187216 9.024552
18   two+ propname            1 8.500456 8.154602 8.846310
19   two+  unclear            0 8.039396 7.433469 8.645323
20   two+  unclear            1 7.560931 6.961012 8.160851

tapply(FUN=c, X=nd$fit, # apply the function c to the predictions
   INDEX=list(          # which are grouped as follows:
             CLASS=nd$CLASS,        # rows
      SURPRISAL_sc=nd$SURPRISAL_sc, # columns
            VOWELS=nd$VOWELS))      # slices

, , VOWELS = one

          SURPRISAL_sc
CLASS             0        1
  closed   7.140473 7.581558
  number   7.907337 8.023885
  open     7.643108 8.160513
  propname 8.443379 8.521196
  unclear  7.960958 7.665738

, , VOWELS = two+

          SURPRISAL_sc
CLASS             0        1
  closed   6.983092 7.240932
  number   7.932180 7.865484
  open     7.795353 8.129512
  propname 8.605884 8.500456
  unclear  8.039396 7.560931

In this case, half of these are what we get from effect, which is one of the most efficient ways to answer 1a questions. This is because effect conveniently holds the predictors not included in the term at ‘typical values’, which for the centered (!) values of SURPRISAL_sc means the effects values are those in nd when SURPRISAL_sc is 0:

(clvo_d <- data.frame(
   clvo <- effect("VOWELS:CLASS", m_01)))

   VOWELS    CLASS      fit         se    lower    upper
1     one   closed 7.140473 0.01347541 7.114057 7.166890
2    two+   closed 6.983092 0.02308346 6.937840 7.028343
3     one   number 7.907337 0.08552890 7.739671 8.075002
4    two+   number 7.932180 0.14137211 7.655043 8.209317
5     one     open 7.643108 0.03509579 7.574309 7.711908
6    two+     open 7.795353 0.02010977 7.755931 7.834775
7     one propname 8.443379 0.13768123 8.173478 8.713281
8    two+ propname 8.605884 0.21356943 8.187216 9.024552
9     one  unclear 7.960958 0.15201852 7.662950 8.258965
10   two+  unclear 8.039396 0.30909339 7.433469 8.645323

1.2.2 Scen. 1b: means for cat vs. 0

Scenario 1b means to ask whether each of the 10 means computed above is significantly different from 0 (and what the exact p-value is). This can so far not be answered fully from the summary output of the model. We can see from nd that none of the predicted means’ confidence intervals overlaps with 0, but we don’t have exact p-values. Right now, we know the answer to this question only for 1 of the 10 means, the one represented by the intercept in summary(m_01)$coefficients: CLASS: closed with VOWELS: one, which clearly is significantly different from 0. With your current knowledge, the ‘best’ way to proceed for you might be to generate m_01 9 more times, each time with CLASS and VOWELS re-leveled in such ways that the intercept comes to represent all 10 possible combinations – clearly not a desirable option, which is why we will discuss a much better approach another time.

1.2.3 Scen. 1c: means for cat vs. each other

Scenario 1c means to ask whether (some? all?) differences between the 10 means are significant. For that, we can use emmeans::emmeans and pairs. For a 2-way interaction like this, there are three options – two of them (options a and b) are usually/often pretty good/useful and justifiable, the last one (option c) often not so much.

Option a here is to make comparisons of the predicted means for the levels of CLASS within the levels of VOWELS:

pairs(emmeans(      # compute pairwise comparisons of means
   object=m_01,     # for the model m_01
   specs= ~ CLASS | VOWELS), # of CLASS within each level of VOWELS
   # you could also write this like this:
   # specs="CLASS", # of CLASS
   # by="VOWELS"),  # within each level of VOWELS
   adjust="none")   # do not adjust for multiple comparisons

VOWELS = one:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.7669 0.0867 6367  -8.848 <0.0001
 closed - open       -0.5026 0.0382 6367 -13.155 <0.0001
 closed - propname   -1.3029 0.1380 6367  -9.413 <0.0001
 closed - unclear    -0.8205 0.1530 6367  -5.375 <0.0001
 number - open        0.2642 0.0921 6367   2.869  0.0041
 number - propname   -0.5360 0.1620 6367  -3.309  0.0009
 number - unclear    -0.0536 0.1740 6367  -0.307  0.7585
 open - propname     -0.8003 0.1420 6367  -5.645 <0.0001
 open - unclear      -0.3178 0.1560 6367  -2.039  0.0415
 propname - unclear   0.4824 0.2050 6367   2.353  0.0187

VOWELS = two+:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.9491 0.1440 6367  -6.606 <0.0001
 closed - open       -0.8123 0.0312 6367 -25.993 <0.0001
 closed - propname   -1.6228 0.2160 6367  -7.523 <0.0001
 closed - unclear    -1.0563 0.3100 6367  -3.405  0.0007
 number - open        0.1368 0.1430 6367   0.960  0.3373
 number - propname   -0.6737 0.2550 6367  -2.641  0.0083
 number - unclear    -0.1072 0.3400 6367  -0.316  0.7522
 open - propname     -0.8105 0.2140 6367  -3.786  0.0002
 open - unclear      -0.2440 0.3100 6367  -0.788  0.4306
 propname - unclear   0.5665 0.3750 6367   1.512  0.1306

You can see that these results nicely map onto our data frame nd; here are the first two:

round(nd$fit[1] - nd$fit[3], 4)

[1] -0.7669

round(nd$fit[1] - nd$fit[5], 4)

[1] -0.5026

But note that we should probably adjust for multiple comparisons because the high number of post-hoc comparisons we’re making increases the probability of false positives. The Holm method is a useful and not too conservative way to do that:

pairs(emmeans(    # compute pairwise comparisons of means
   object=m_01,   # for the model m_01
   specs= ~ CLASS | VOWELS), # of CLASS within each level of VOWELS
   adjust="holm") # adjust for multiple comparisons w/ Holm's method

VOWELS = one:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.7669 0.0867 6367  -8.848 <0.0001
 closed - open       -0.5026 0.0382 6367 -13.155 <0.0001
 closed - propname   -1.3029 0.1380 6367  -9.413 <0.0001
 closed - unclear    -0.8205 0.1530 6367  -5.375 <0.0001
 number - open        0.2642 0.0921 6367   2.869  0.0165
 number - propname   -0.5360 0.1620 6367  -3.309  0.0047
 number - unclear    -0.0536 0.1740 6367  -0.307  0.7585
 open - propname     -0.8003 0.1420 6367  -5.645 <0.0001
 open - unclear      -0.3178 0.1560 6367  -2.039  0.0830
 propname - unclear   0.4824 0.2050 6367   2.353  0.0560

VOWELS = two+:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.9491 0.1440 6367  -6.606 <0.0001
 closed - open       -0.8123 0.0312 6367 -25.993 <0.0001
 closed - propname   -1.6228 0.2160 6367  -7.523 <0.0001
 closed - unclear    -1.0563 0.3100 6367  -3.405  0.0040
 number - open        0.1368 0.1430 6367   0.960  1.0000
 number - propname   -0.6737 0.2550 6367  -2.641  0.0414
 number - unclear    -0.1072 0.3400 6367  -0.316  1.0000
 open - propname     -0.8105 0.2140 6367  -3.786  0.0011
 open - unclear      -0.2440 0.3100 6367  -0.788  1.0000
 propname - unclear   0.5665 0.3750 6367   1.512  0.5222

P value adjustment: holm method for 10 tests

For the most part, I will continue to not adjust for multiple tests, but that’s also only for didactic reasons, namely so you can make comparisons to the models’ summaries – in real applications, adjusting for multiple tests is often useful if not required.

Option b here is the reverse perspective from the last one: to make comparisons of the predicted means for the levels of VOWELS within the levels of CLASS:

pairs(emmeans(    # compute pairwise comparisons of means
   object=m_01,   # for the model m_01
   specs= ~ VOWELS | CLASS), # of VOWELS within each level of CLASS
   # you could also write this like this:
   # specs="VOWELS", # of VOWELS
   # by="CLASS"), # within each level of CLASS
   adjust="none") # do not adjust for multiple comparisons

CLASS = closed:
 contrast     estimate     SE   df t.ratio p.value
 one - (two+)   0.1574 0.0264 6367   5.951 <0.0001

CLASS = number:
 contrast     estimate     SE   df t.ratio p.value
 one - (two+)  -0.0248 0.1490 6367  -0.167  0.8674

CLASS = open:
 contrast     estimate     SE   df t.ratio p.value
 one - (two+)  -0.1522 0.0387 6367  -3.929 <0.0001

CLASS = propname:
 contrast     estimate     SE   df t.ratio p.value
 one - (two+)  -0.1625 0.1930 6367  -0.844  0.3988

CLASS = unclear:
 contrast     estimate     SE   df t.ratio p.value
 one - (two+)  -0.0784 0.3040 6367  -0.258  0.7963

Again, this maps perfectly onto what we’d see in nd; here are the first two:

round(nd$fit[1] - nd$fit[11], 4)

[1] 0.1574

round(nd$fit[3] - nd$fit[13], 4)

[1] -0.0248

Option c, the not recommended one, is to make all possible comparisons across all combinations of levels – fishing for results by considering even comparisons no one would ever postulate a priori; for that extreme fishing, adjusting for multiple comparisons is pretty much obligatory.

pairs(emmeans(    # compute pairwise comparisons of means
   object=m_01,   # for the model m_01
   specs= ~ VOWELS:CLASS), # of VOWELS:CLASS
   adjust="holm") # adjust for multiple comparisons w/ Holm's method

 contrast                         estimate     SE   df t.ratio p.value
 one closed - (two+ closed)         0.1574 0.0264 6367   5.951 <0.0001
 one closed - one number           -0.7669 0.0867 6367  -8.848 <0.0001
 one closed - (two+ number)        -0.7917 0.1420 6367  -5.581 <0.0001
 one closed - one open             -0.5026 0.0382 6367 -13.155 <0.0001
 one closed - (two+ open)          -0.6549 0.0239 6367 -27.381 <0.0001
 one closed - one propname         -1.3029 0.1380 6367  -9.413 <0.0001
 one closed - (two+ propname)      -1.4654 0.2140 6367  -6.858 <0.0001
 one closed - one unclear          -0.8205 0.1530 6367  -5.375 <0.0001
 one closed - (two+ unclear)       -0.8989 0.3090 6367  -2.906  0.0807
 (two+ closed) - one number        -0.9242 0.0883 6367 -10.462 <0.0001
 (two+ closed) - (two+ number)     -0.9491 0.1440 6367  -6.606 <0.0001
 (two+ closed) - one open          -0.6600 0.0404 6367 -16.330 <0.0001
 (two+ closed) - (two+ open)       -0.8123 0.0312 6367 -25.993 <0.0001
 (two+ closed) - one propname      -1.4603 0.1390 6367 -10.476 <0.0001
 (two+ closed) - (two+ propname)   -1.6228 0.2160 6367  -7.523 <0.0001
 (two+ closed) - one unclear       -0.9779 0.1540 6367  -6.363 <0.0001
 (two+ closed) - (two+ unclear)    -1.0563 0.3100 6367  -3.405  0.0167
 one number - (two+ number)        -0.0248 0.1490 6367  -0.167  1.0000
 one number - one open              0.2642 0.0921 6367   2.869  0.0866
 one number - (two+ open)           0.1120 0.0880 6367   1.273  1.0000
 one number - one propname         -0.5360 0.1620 6367  -3.309  0.0226
 one number - (two+ propname)      -0.6985 0.2300 6367  -3.031  0.0563
 one number - one unclear          -0.0536 0.1740 6367  -0.307  1.0000
 one number - (two+ unclear)       -0.1321 0.3210 6367  -0.412  1.0000
 (two+ number) - one open           0.2891 0.1460 6367   1.976  0.7231
 (two+ number) - (two+ open)        0.1368 0.1430 6367   0.960  1.0000
 (two+ number) - one propname      -0.5112 0.1980 6367  -2.588  0.1840
 (two+ number) - (two+ propname)   -0.6737 0.2550 6367  -2.641  0.1655
 (two+ number) - one unclear       -0.0288 0.2080 6367  -0.139  1.0000
 (two+ number) - (two+ unclear)    -0.1072 0.3400 6367  -0.316  1.0000
 one open - (two+ open)            -0.1522 0.0387 6367  -3.929  0.0023
 one open - one propname           -0.8003 0.1420 6367  -5.645 <0.0001
 one open - (two+ propname)        -0.9628 0.2180 6367  -4.420  0.0003
 one open - one unclear            -0.3178 0.1560 6367  -2.039  0.6641
 one open - (two+ unclear)         -0.3963 0.3120 6367  -1.272  1.0000
 (two+ open) - one propname        -0.6480 0.1390 6367  -4.654 <0.0001
 (two+ open) - (two+ propname)     -0.8105 0.2140 6367  -3.786  0.0040
 (two+ open) - one unclear         -0.1656 0.1530 6367  -1.080  1.0000
 (two+ open) - (two+ unclear)      -0.2440 0.3100 6367  -0.788  1.0000
 one propname - (two+ propname)    -0.1625 0.1930 6367  -0.844  1.0000
 one propname - one unclear         0.4824 0.2050 6367   2.353  0.3174
 one propname - (two+ unclear)      0.4040 0.3390 6367   1.193  1.0000
 (two+ propname) - one unclear      0.6449 0.2620 6367   2.458  0.2518
 (two+ propname) - (two+ unclear)   0.5665 0.3750 6367   1.512  1.0000
 one unclear - (two+ unclear)      -0.0784 0.3040 6367  -0.258  1.0000

P value adjustment: holm method for 45 tests

Now, what if you had only been interested in a subset of the comparisons that pairs(emmeans(...)) makes? If you were interested in only, say, 5 comparisons, you don’t want pairs to adjust for 10. In such cases, you can extract the 5 non-adjusted p-values you’re interested in and correct them yourself only for the 5 tests you’re making. Imagine you were only interested in the comparisons 11 to 15 shown in the following output:

(qwe <- pairs(emmeans( # compute pairwise comparisons of means
    object=m_01,       # for the model m_01
    specs= ~ CLASS | VOWELS), # of CLASS within each level of VOWELS
    adjust="none"))    # do not adjust for multiple comparisons

VOWELS = one:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.7669 0.0867 6367  -8.848 <0.0001
 closed - open       -0.5026 0.0382 6367 -13.155 <0.0001
 closed - propname   -1.3029 0.1380 6367  -9.413 <0.0001
 closed - unclear    -0.8205 0.1530 6367  -5.375 <0.0001
 number - open        0.2642 0.0921 6367   2.869  0.0041
 number - propname   -0.5360 0.1620 6367  -3.309  0.0009
 number - unclear    -0.0536 0.1740 6367  -0.307  0.7585
 open - propname     -0.8003 0.1420 6367  -5.645 <0.0001
 open - unclear      -0.3178 0.1560 6367  -2.039  0.0415
 propname - unclear   0.4824 0.2050 6367   2.353  0.0187

VOWELS = two+:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.9491 0.1440 6367  -6.606 <0.0001
 closed - open       -0.8123 0.0312 6367 -25.993 <0.0001
 closed - propname   -1.6228 0.2160 6367  -7.523 <0.0001
 closed - unclear    -1.0563 0.3100 6367  -3.405  0.0007
 number - open        0.1368 0.1430 6367   0.960  0.3373
 number - propname   -0.6737 0.2550 6367  -2.641  0.0083
 number - unclear    -0.1072 0.3400 6367  -0.316  0.7522
 open - propname     -0.8105 0.2140 6367  -3.786  0.0002
 open - unclear      -0.2440 0.3100 6367  -0.788  0.4306
 propname - unclear   0.5665 0.3750 6367   1.512  0.1306

Then this is how you would proceed because the function p.adjust determines how many tests to adjust for by looking at the length of the p argument, here 5:

p.adjust(
   p=data.frame(qwe)[11:15,7],
   method="holm")

[1]  1.277823e-10 5.840507e-141  2.435275e-13  1.332419e-03  3.373353e-01

1.2.4 Scen. 2a: means for num

To deal with the 3 scenarios 2a, 2b, and 2c, we now look at a new model m_02, which is slightly simpler (we leave out VOWELS) and fit slightly differently; importantly, note that SURPRISAL is now square-root transformed, but not centered:

d <- read.delim("_input/wdurations.csv", stringsAsFactors=TRUE)
d$DURATION_l <- log2(d$DURATION)   # we log DURATION again
d$SURPRISAL_s <- sqrt(d$SURPRISAL) # SURPRISAL is now not centered
(m_02 <- lm(data=d, formula=DURATION_l ~ 1 +
   SURPRISAL_s*CLASS)) %>% { list(
      "Summary"=summary(.),
       "Drop 1"=drop1(., test="F") %>% as.data.frame) }

$Summary

Call:
lm(formula = DURATION_l ~ 1 + SURPRISAL_s * CLASS, data = d)

Residuals:
     Min       1Q   Median       3Q      Max
-2.91777 -0.42367  0.00454  0.44474  2.85643

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)
(Intercept)                6.16980    0.03707 166.424  < 2e-16 ***
SURPRISAL_s                0.39268    0.01697  23.142  < 2e-16 ***
CLASSnumber                1.58940    0.36083   4.405 1.08e-05 ***
CLASSopen                  0.70446    0.07590   9.281  < 2e-16 ***
CLASSpropname              2.16824    0.38200   5.676 1.44e-08 ***
CLASSunclear               2.57069    0.72650   3.538 0.000405 ***
SURPRISAL_s:CLASSnumber   -0.32626    0.12879  -2.533 0.011322 *
SURPRISAL_s:CLASSopen     -0.01876    0.02847  -0.659 0.509898
SURPRISAL_s:CLASSpropname -0.33912    0.10922  -3.105 0.001911 **
SURPRISAL_s:CLASSunclear  -0.71745    0.25946  -2.765 0.005705 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6849 on 6373 degrees of freedom
Multiple R-squared:  0.3762,    Adjusted R-squared:  0.3754
F-statistic: 427.1 on 9 and 6373 DF,  p-value: < 2.2e-16


$`Drop 1`
                  Df Sum of Sq      RSS       AIC  F value       Pr(>F)
<none>            NA        NA 2989.676 -4821.334       NA           NA
SURPRISAL_s:CLASS  4  10.84202 3000.518 -4806.228 5.777897 0.0001224216

Scenario 2a means to ask the following question: what are the 5 predicted means of DURATION_l for each of the levels of CLASS and when SURPRISAL_s is controlled for in some way? The summary output shows only 1 of the 5, the one represented by the intercept in summary(m_02)$coefficients, i.e. when CLASS is closed and SURPRISAL_s is 0. If we want the means for all 5 levels of CLASS for, for instance, when SURPRISAL_s is 0, with the present model we can compute them like this:

coef(m_02)[1] + c(0, coef(m_02)[3:6])

                CLASSnumber     CLASSopen CLASSpropname  CLASSunclear
     6.169798      7.759196      6.874254      8.338038      8.740491

But the nd approach is sometimes more intuitive: Here are the predicted means of DURATION_l for the levels of CLASS when the predictor SURPRISAL_s is 0 (the intercept), 1 (for the slope difference), and its mean (2.3708677)

nd <- expand.grid(
   SURPRISAL_s=c(0, 1, mean(d$SURPRISAL_s)),     # main values of SURPRISAL_s
         CLASS=levels(d$CLASS))              %>% # all levels of CLASS, then
   "["(,2:1)                                     # reorder columns
nd <- cbind(nd, predict(
   m_02, newdata=nd, interval="confidence"))
nd[nd$SURPRISAL_s==0,] # only part of nd, when SURP_s is 0

      CLASS SURPRISAL_s      fit      lwr       upr
1    closed           0 6.169798 6.097123  6.242474
4    number           0 7.759196 7.055597  8.462795
7      open           0 6.874254 6.744415  7.004092
10 propname           0 8.338038 7.592729  9.083347
13  unclear           0 8.740491 7.318164 10.162818

We can also get those from effect like this, but …

(cl_d <- data.frame(
   cl <- effect(
      "CLASS", m_02)))

     CLASS      fit         se    lower    upper
1   closed 7.100787 0.01180527 7.077645 7.123930
2   number 7.916657 0.07969515 7.760428 8.072886
3     open 7.760762 0.01820300 7.725078 7.796446
4 propname 8.465014 0.13552532 8.199338 8.730689
5  unclear 7.970505 0.14905516 7.678307 8.262703

… you can see we’re getting a warning because CLASS is in the interaction with SURPRISAL_s so we can’t just ignore it. Well, we can, kinda, but only because effect is smart enough to give us the predictions for the 5 levels of CLASS while SURPRISAL_s is held constant at its mean (which now, that we using the non-centered d$SURPRISAL_s, is not 0 but 2.3708677). But the better way to do this would be like this:

sucl_d <- data.frame(
   sucl <- effect(
      "SURPRISAL_s:CLASS", m_02,
      xlevels=list(SURPRISAL_s=c(
         0:1, mean(d$SURPRISAL_s))))) %>% "["(,c(2,1,3,5,6))
# alternative:
# sucl_d <- data.frame(
#    sucl <- predictorEffect(
#       "SURPRISAL_s", m_02,
#       focal.levels=c(0:1, mean(d$SURPRISAL_s))))%>% "["(,c(2,1,3,5,6))
sucl_d[sucl_d$SURPRISAL_s==0,] # only part of nd, when SURP_s is 0

      CLASS SURPRISAL_s      fit    lower     upper
1    closed           0 6.169798 6.097123  6.242474
4    number           0 7.759196 7.055597  8.462795
7      open           0 6.874254 6.744415  7.004092
10 propname           0 8.338038 7.592729  9.083347
13  unclear           0 8.740491 7.318164 10.162818

1.2.5 Scen. 2b: means for num vs. 0

Scenario 2b means to ask whether each of the 5 means computed above is significantly different from 0 (and what the exact p-value is). This can so far not be answered fully from the summary output of the model. We can see from nd that none of the predicted means’ confidence intervals overlaps with 0, but we don’t have p-values. Right now, we know the answer to this question only for 1 of the 10 means, the one represented by the intercept in summary(m_02)$coefficients, i.e. when CLASS: closed and SURPRISAL_s is 0, which clearly is significantly different from 0. With your current knowledge, the ‘best’ way to proceed for you would be to generate m_02 4 more times, each time with CLASS re-leveled in such way that the intercept comes to represent all 5 possible levels – clearly not a desirable option, which is why we will discuss a much better approach another time.

1.2.6 Scen. 2c: means for num vs. each other

Scenario 2c means to ask whether (some? all?) differences between the 5 means are significant. For that, we can use emmeans again:

(qwe <- pairs(emmeans( # compute pairwise comparisons of means
   object=m_02,        # for the model m_02
   specs= ~ CLASS | SURPRISAL_s), # of CLASS SURPRISAL_s is its mean
   adjust="none"))     # do not adjust for multiple comparisons

SURPRISAL_s = 2.37:
 contrast           estimate     SE   df t.ratio p.value
 closed - number     -0.8159 0.0806 6373 -10.127 <0.0001
 closed - open       -0.6600 0.0217 6373 -30.419 <0.0001
 closed - propname   -1.3642 0.1360 6373 -10.028 <0.0001
 closed - unclear    -0.8697 0.1500 6373  -5.817 <0.0001
 number - open        0.1559 0.0817 6373   1.907  0.0566
 number - propname   -0.5484 0.1570 6373  -3.488  0.0005
 number - unclear    -0.0538 0.1690 6373  -0.319  0.7501
 open - propname     -0.7043 0.1370 6373  -5.150 <0.0001
 open - unclear      -0.2097 0.1500 6373  -1.397  0.1625
 propname - unclear   0.4945 0.2010 6373   2.455  0.0141

Note, these results do not map onto the results from the summary table or the values from nd! From the summary table, we would expect the difference between CLASS closed and CLASS open to be this:

summary(m_02)$coefficients[4,"Estimate"]

[1] 0.7044554

Same from nd:

nd$fit[7] - nd$fit[1]

[1] 0.7044554

But that’s not what pairs(emmeans(...)) says:

data.frame(qwe)[2,-2]

       contrast   estimate         SE   df   t.ratio       p.value
2 closed - open -0.6599747 0.02169594 6373 -30.41927 6.723276e-190

That is because the summary of the model and the part of nd we looked at above returns predictions when SURPRISAL_s is 0 whereas pairs(emmeans(...)) returns predictions when SURPRISAL_s is its mean, namely 2.3708677. Thus, these results would be identical if SURPRISAL had been centered but are not this time. Without centering, things fit only if we look at the ‘right’ part of nd, and that is of course why nd was above created that way:

nd$fit[3] - nd$fit[9]

[1] -0.6599747

data.frame(qwe)[2,]

       contrast SURPRISAL_s   estimate         SE   df   t.ratio       p.value
2 closed - open    2.370868 -0.6599747 0.02169594 6373 -30.41927 6.723276e-190

But can we get predictions/comparisons that match the summary output, i.e. for when SURPRISAL_s is 0, from emmeans as well? Yes, because emmeans can take an at argument:

pairs(emmeans(    # compute pairwise comparisons of means
   object=m_02,   # for the model m_02
   specs= ~ CLASS | SURPRISAL_s, # of CLASS SURPRISAL_s is ...
   at=list(SURPRISAL_s=0)),      # set to 0, not its mean!
   adjust="none") # do not adjust for multiple comparisons

SURPRISAL_s = 0:
 contrast           estimate     SE   df t.ratio p.value
 closed - number      -1.589 0.3610 6373  -4.405 <0.0001
 closed - open        -0.704 0.0759 6373  -9.281 <0.0001
 closed - propname    -2.168 0.3820 6373  -5.676 <0.0001
 closed - unclear     -2.571 0.7260 6373  -3.538  0.0004
 number - open         0.885 0.3650 6373   2.425  0.0154
 number - propname    -0.579 0.5230 6373  -1.107  0.2683
 number - unclear     -0.981 0.8090 6373  -1.212  0.2255
 open - propname      -1.464 0.3860 6373  -3.793  0.0002
 open - unclear       -1.866 0.7290 6373  -2.562  0.0104
 propname - unclear   -0.402 0.8190 6373  -0.491  0.6232

1.2.7 Scen. 3a: slopes for num

Let’s continue with m_02, because it has the categorical predictor CLASS, whose predicted means we just looked at, but it also has a numeric predictor SURPRISAL_s, and since there is a significant interaction between those two predictors, our model involves 5 different slopes of SURPRISAL_s, one for each level of CLASS, with some significant differences between them.

summary(m_02)$coefficients

                             Estimate Std. Error     t value      Pr(>|t|)
(Intercept)                6.16979831 0.03707286 166.4235886  0.000000e+00
SURPRISAL_s                0.39267865 0.01696794  23.1423912 7.685868e-114
CLASSnumber                1.58939734 0.36082711   4.4048723  1.075782e-05
CLASSopen                  0.70445540 0.07590253   9.2810530  2.254246e-20
CLASSpropname              2.16824002 0.38199775   5.6760544  1.439034e-08
CLASSunclear               2.57069279 0.72649913   3.5384665  4.053259e-04
SURPRISAL_s:CLASSnumber   -0.32626365 0.12878748  -2.5333490  1.132161e-02
SURPRISAL_s:CLASSopen     -0.01876137 0.02846803  -0.6590329  5.098985e-01
SURPRISAL_s:CLASSpropname -0.33912215 0.10921507  -3.1050858  1.910585e-03
SURPRISAL_s:CLASSunclear  -0.71744847 0.25945594  -2.7652035  5.705098e-03

Scenario 3a means to ask the following question: what are the 5 predicted slopes of SURPRISAL_s for each of the levels of CLASS? We can compute those slopes manually from the coefficients of m_02:

the slope of SURPRISAL_s when CLASS is closed is listed in the summary table as the 2nd coefficient, the one called SURPRISAL_s, because that’s the slope of the predictor when CLASS is its first level: (coef(m_02)[2], i.e. 0.3926787);
the slope of SURPRISAL_s when CLASS is number is that 2nd coefficient plus the 7th coefficient, the one called SURPRISAL_s:CLASSnumber, i.e. the adjustment to the slope of SURPRISAL_s when CLASS is not closed but number: coef(m_02)[2] + coef(m_02)[7], i.e. 0.066415;
the slope of SURPRISAL_s when CLASS is open is that 2nd coefficient plus the 8th coefficient, the one called SURPRISAL_s:CLASSopen, i.e. the adjustment to the slope of SURPRISAL_s when CLASS is not closed but open: coef(m_02)[2] + coef(m_02)[8], i.e. 0.3739173;
the slope of SURPRISAL_s when CLASS is propname is that 2nd coefficient plus the 9th coefficient, the one called SURPRISAL_s:CLASSpropname, i.e. the adjustment to the slope of SURPRISAL_s when CLASS is not closed but propname: coef(m_02)[2] + coef(m_02)[9], i.e. 0.0535565;
the slope of SURPRISAL_s when CLASS is unclear is that 2nd coefficient plus the 10th coefficient, the one called SURPRISAL_s:CLASSunclear, i.e. the adjustment to the slope of SURPRISAL_s when CLASS is not closed but unclear: coef(m_02)[2] + coef(m_02)[10], i.e. -0.3247698.

Or, more concisely:

coef(m_02)[2] + c(0, coef(m_02)[7:10])

                            SURPRISAL_s:CLASSnumber     SURPRISAL_s:CLASSopen
                0.3926787                 0.0664150                 0.3739173
SURPRISAL_s:CLASSpropname  SURPRISAL_s:CLASSunclear
                0.0535565                -0.3247698

But why not use nd instead? This is what it looked like:

nd

      CLASS SURPRISAL_s      fit      lwr       upr
1    closed    0.000000 6.169798 6.097123  6.242474
2    closed    1.000000 6.562477 6.520488  6.604466
3    closed    2.370868 7.100787 7.077645  7.123930
4    number    0.000000 7.759196 7.055597  8.462795
5    number    1.000000 7.825611 7.366613  8.284608
6    number    2.370868 7.916657 7.760428  8.072886
7      open    0.000000 6.874254 6.744415  7.004092
8      open    1.000000 7.248171 7.161384  7.334958
9      open    2.370868 7.760762 7.725078  7.796446
10 propname    0.000000 8.338038 7.592729  9.083347
11 propname    1.000000 8.391595 7.853413  8.929776
12 propname    2.370868 8.465014 8.199338  8.730689
13  unclear    0.000000 8.740491 7.318164 10.162818
14  unclear    1.000000 8.415721 7.492344  9.339099
15  unclear    2.370868 7.970505 7.678307  8.262703

Thus, we could immediately get the slopes like this:

tapply(nd$fit,           # apply a grouping of the predictions
   list(nd$CLASS,        # by CLASS
        nd$SURPRISAL_s), # and SURPRISAL_s
   c)                %>% # just list the values,
   "["(,-3)          %>% # suppress the 3rd column
   apply(1, diff)        # compute the differences

    closed     number       open   propname    unclear
 0.3926787  0.0664150  0.3739173  0.0535565 -0.3247698

Or, you could do the same like this:

nd.split <- split(nd, nd$SURPRISAL_s)
nd.split$`1`$fit - nd.split$`0`$fit

[1]  0.3926787  0.0664150  0.3739173  0.0535565 -0.3247698

But the most practical solution might be the function emmeans::emtrends:

emtrends(         # compute the trends/slopes
   object=m_02,   # for the model m_02
   specs="CLASS", # compute comparisons over this predictor
   var="SURPRISAL_s") # namely all slopes of this predictor

 CLASS    SURPRISAL_s.trend     SE   df lower.CL upper.CL
 closed              0.3927 0.0170 6373    0.359    0.426
 number              0.0664 0.1280 6373   -0.184    0.317
 open                0.3739 0.0229 6373    0.329    0.419
 propname            0.0536 0.1080 6373   -0.158    0.265
 unclear            -0.3248 0.2590 6373   -0.832    0.183

Confidence level used: 0.95

1.2.8 Scen. 3b: slopes for num vs. 0

Scenario 3b means to ask whether each of the 5 slopes just computed is significantly different from 0 (and what the exact p-value is). This can so far not be answered from the summary output of the model. Right now, we know the answer to this question only for 1 of the 5 slopes, the one represented by the 2nd coefficient (i.e., when CLASS: closed). With your current knowledge, the ‘best’ way to proceed for you would be to generate m_01 4 more times, each time with CLASS re-leveled in such way that the 2nd coefficient comes to represent all 5 possible values – clearly not a desirable option, which is why we will discuss a much better approach another time.

1.2.9 Scen. 3c: slopes for num vs. each other

Scenario 3c means to ask whether (some? all?) differences between the 5 slopes are significant. Since we’re looking to compare slopes (a.k.a. trendlines), we use the function emmeans::emtrends but in conjunction with pairs, which will make comparisons of the predicted slopes of SURPRISAL_s between all combinations of the levels of CLASS:

pairs(emtrends(   # compute pairwise comparisons of trends/slopes
   object=m_02,   # for the model m_02
   specs="CLASS", # compute comparisons over this predictor
   var="SURPRISAL_s"), # namely all slopes of this predictor
   adjust="none") # do not adjust for multiple comparisons

 contrast           estimate     SE   df t.ratio p.value
 closed - number      0.3263 0.1290 6373   2.533  0.0113
 closed - open        0.0188 0.0285 6373   0.659  0.5099
 closed - propname    0.3391 0.1090 6373   3.105  0.0019
 closed - unclear     0.7174 0.2590 6373   2.765  0.0057
 number - open       -0.3075 0.1300 6373  -2.371  0.0178
 number - propname    0.0129 0.1670 6373   0.077  0.9387
 number - unclear     0.3912 0.2890 6373   1.355  0.1754
 open - propname      0.3204 0.1100 6373   2.905  0.0037
 open - unclear       0.6987 0.2600 6373   2.688  0.0072
 propname - unclear   0.3783 0.2800 6373   1.349  0.1774

1.3 A priori (orthogonal) contrasts

1.3.1 Theoretical introduction

Let us turn to something really interesting now. All the above comparisons were concerned with fitting a model and then, once we have a model, figuring out what’s significant and what is not, and doing so by both understanding the regular summary output and its coefficients, but also make comparisons between predictions that the summary table does not readily show us (with emmeans/emtrends). But as I already mentioned in Section 1.1 above, there are actually even more questions than one might ask, and everything we have done so far was all done post-hoc, which is why we discussed correcting for multiple post-hoc tests. But in some situations, one can set up the predictors in such a way that the summary table already answers multiple a priori hypotheses and especially multiple a priori hypotheses that differ from the standard ones checked so far, and that’s what we’re discussing now.

We’re reloading the duration data:

rm(list=ls(all.names=TRUE)); pkgs2load <- c("car", "effects", "emmeans", "multcomp")
suppressMessages(sapply(pkgs2load, library, character.only=TRUE, logical.return=TRUE, quietly=TRUE)); rm(pkgs2load)

     car  effects  emmeans multcomp
    TRUE     TRUE     TRUE     TRUE

d <- read.delim("_input/wdurations.csv", stringsAsFactors=TRUE)
d$DURATION_l <- log2(d$DURATION); d$SURPRISAL_s <- sqrt(d$SURPRISAL); summary(d[,c(12,10)])

   DURATION_l          CLASS
 Min.   : 3.807   closed  :4021
 1st Qu.: 6.781   number  : 125
 Median : 7.340   open    :2086
 Mean   : 7.348   propname: 110
 3rd Qu.: 7.937   unclear :  41
 Max.   :10.456

Imagine for a moment we only want to fit a monofactorial model – just for simplicity’s sake, as we’re broaching this new topic – and you have the following hypotheses about the variable CLASS:

the mean of DURATION_l for closed and open combined will be different from the corresponding mean for all other levels combined (because, say, they are ‘normal words’ and the other levels are ‘more weird’);
the mean of DURATION_l for closed will be different from the corresponding mean for open (because, say, these are still functionally, frequency-wise, and distributionally very different classes);
the mean of DURATION_l for unclear will not be different from the corresponding mean for number and propname combined (because, say, unclear words could be anything);
the mean of DURATION_l for number will not be different from the corresponding mean for propname (because these are obviously quite different classes).

It is possible to define a set of contrasts for the predictor CLASS that ‘embodies’ these 4 hypotheses so that the summary output will not show you what it usually does, but exactly the answers to these hypotheses. For that we first create a copy of CLASS, which we call CLASS_orth (for orthogonal, because these four hypotheses can be tested with 4 orthogonal contrasts):

levels(d$CLASS_orth <- d$CLASS)

[1] "closed"   "number"   "open"     "propname" "unclear"

For a predictor with l levels (here, l=5), one can formulate l-1 orthogonal contrasts or contrast vectors – how lucky we have exactly 4 … – and the formulation of each orthogonal contrast (vector) is governed by a set of rules we need to discuss:

levels that are ignored in a comparison are set to 0;
each level can only participate in one comparison in isolation.
the sum of all value tokens of a contrast vector is 0;
(groups of) levels that are contrasted with each other differ in their sign;
the sum of all absolute values of types is 1 (this one is really important, many other sources do not include this one, which makes regression summary tables harder to interpret than necessary, as I will demonstrate in a small excursus below);
the products of the weights of the levels sum to 0.

Without an example, this is probably close to incomprehensible, so let’s look at what this means for our four hypotheses/contrasts for CLASS_orth; since we will have to define this within R as a matrix, we will do so here as well. Check out Table 2 (which uses abbreviated level names for space reasons):

Table 2: Orthogonal contrast settings for our hypotheses

CONTRAST / `CLASS`	closed	number	open	propname	unclear
1: closed/open vs. other 3	+0.6	-0.4	+0.6	-0.4	-0.4
2: closed vs. open	+0.5	0	-0.5	0	0
3: number/propname vs. unclear	0	-0.3333	0	-0.3333	+0.6667
4: number vs. propname	0	0.5	0	-0.5	0

So, let’s walk through those rules separately for each of the hypotheses/contrasts:

with regard to rule 1,
1. contrast 1: no level is ignored in this contrast: all five levels of CLASS are involved, so none of them gets a 0, ✓;
2. contrast 2: this is only concerned with closed and open, which means all other levels get a 0, ✓;
3. contrast 3: this is only concerned with number, propname, and unclear, which means all other levels get a 0, ✓;
4. contrast 4: this is only concerned with number and propname, which means all other levels get a 0, ✓;
with regard to rule 2, each level does only participate in one comparison in isolation:
1. closed is used in isolation only once, in contrast 2, ✓;
2. number is used in isolation only once, in contrast 4, ✓;
3. open is used in isolation only once, in contrast 2, ✓;
4. propname is used in isolation only once, in contrast 4, ✓;
5. unclear is used in isolation only once, in contrast 3, ✓;
with regard to rule 3, if we add up the 5 values in the each row of the contrast table, we do get 0 for each row, ✓;
with regard to rule 4,
1. contrast 1: the two groups of values that are contrasted differ in their sign: closed/open have a positive contrast weight whereas the other three have a negative one, ✓;
2. contrast 2: the two values that are contrasted differ in their sign: closed has a positive contrast weight whereas open has a negative one, ✓;
3. contrast 3: the two groups of values that are contrasted differ in their sign: number/propname have positive contrast weights whereas unclear has a negative one, ✓;
4. contrast 4: the two values that are contrasted differ in their sign: number has a positive contrast weight whereas propname has a negative one, ✓;
with regard to rule 5,
1. contrast 1: if we take the two value types (+0.6 and -0.4), make them both positive, and add them up, we do get 1, ✓;
2. contrast 2: if we take the two value types (+0.5 and -0.5), make them both positive, and add them up, we do get 1, ✓;
3. contrast 3: if we take the two value types (+²/₃ and -¹/₃), make them both positive, and add them up, we do get 1, ✓;
4. contrast 4: if we take the two value types (+0.5 and -0.5), make them both positive, and add them up, we do get 1, ✓;
this sum is indeed 0: 0.6×0.5×0×0 + -0.4×0×-0.3333×0.5 … is 0.

Now we tell R this by making each column of Table 2 a vector, which we combine into a matrix and assign as contrasts to CLASS_orth:

#                           clos  numb  open  prop  uncl
clo.op_vs_num.prop.unc <- c(+0.6, -0.4, +0.6, -0.4, -0.4)
clo_vs_op              <- c(+0.5,  0  , -0.5,  0  ,  0  )
num.prop_vs_uncl       <- c( 0  , -1/3,  0  , -1/3, +2/3)
num_vs_prop            <- c( 0  , +0.5,  0  , -0.5,  0  )
# & then we put those into matrix like the above table & assign them as contrasts:
(contrasts(d$CLASS_orth) <-     # make the following the contrasts of CLASS_orth
   cbind( # the matrix resulting from binding together as columns
      clo.op_vs_num.prop.unc, # all
      clo_vs_op,              # four
      num.prop_vs_uncl,       # contrast
      num_vs_prop))           # vectors

     clo.op_vs_num.prop.unc clo_vs_op num.prop_vs_uncl num_vs_prop
[1,]                    0.6       0.5        0.0000000         0.0
[2,]                   -0.4       0.0       -0.3333333         0.5
[3,]                    0.6      -0.5        0.0000000         0.0
[4,]                   -0.4       0.0       -0.3333333        -0.5
[5,]                   -0.4       0.0        0.6666667         0.0

1.3.2 Implications for model coefficients

Let’s now see what that does. We fit a first model with the default treatment contrasts of CLASS (m_03_treatm) and then a second one with our new orthogonal-contrasts predictor CLASS_orth (m_03_orthog):

summary(m_03_treatm <- lm(DURATION_l ~ 1 + CLASS     , data=d))$coefficients %>% round(4)

              Estimate Std. Error  t value Pr(>|t|)
(Intercept)     6.9905     0.0115 610.0570        0
CLASSnumber     0.9526     0.0660  14.4355        0
CLASSopen       0.9390     0.0196  47.8932        0
CLASSpropname   1.5334     0.0702  21.8370        0
CLASSunclear    0.8498     0.1141   7.4504        0

summary(m_03_orthog <- lm(DURATION_l ~ 1 + CLASS_orth, data=d))$coefficients %>% round(4)

                                 Estimate Std. Error  t value Pr(>|t|)
(Intercept)                        7.8455     0.0299 262.7726   0.0000
CLASS_orthclo.op_vs_num.prop.unc  -0.6424     0.0503 -12.7735   0.0000
CLASS_orthclo_vs_op               -0.9390     0.0196 -47.8932   0.0000
CLASS_orthnum.prop_vs_uncl        -0.3933     0.1230  -3.1969   0.0014
CLASS_orthnum_vs_prop             -0.5808     0.0950  -6.1140   0.0000

To save space for presenting this, I am not showing the overall model statistics (the R²s, the F-test, etc.) but rest assured, they are all the same across the two models – all the new set of contrasts changes is the coefficients part. To recap, in m_03_treatm,

coefficient 1 (the intercept) is the predicted response when CLASS is closed;
coefficient 2 is what you add to the intercept to get the predicted response when CLASS changes from closed to number;
coefficient 3 is what you add to the intercept to get the predicted response when CLASS changes from closed to open;
coefficient 4 is what you add to the intercept to get the predicted response when CLASS changes from closed to propname;
coefficient 5 is what you add to the intercept to get the predicted response when CLASS changes from closed to unclear.

But in m_03_orthog, things are different. To understand how, let us first compute the predictions of both models, which are of course also identical (because, again, the only thing the contrasts are affecting, so to speak, is the coefficients part of the model summary).

nd_treatm <- data.frame(CLASS=levels(d$CLASS))
(nd_treatm <- cbind(nd_treatm, PREDS=predict(m_03_treatm, newdata=nd_treatm)))

     CLASS    PREDS
1   closed 6.990532
2   number 7.943175
3     open 7.929542
4 propname 8.523963
5  unclear 7.840289

nd_orthog <- data.frame(CLASS_orth=levels(d$CLASS_orth))
(nd_orthog <- cbind(nd_orthog, PREDS=predict(m_03_orthog, newdata=nd_orthog)))

  CLASS_orth    PREDS
1     closed 6.990532
2     number 7.943175
3       open 7.929542
4   propname 8.523963
5    unclear 7.840289

Now, what do the coefficients of m_03_orthog mean?

coef(m_03_orthog)

                     (Intercept) CLASS_orthclo.op_vs_num.prop.unc
                       7.8455003                       -0.6424391
             CLASS_orthclo_vs_op       CLASS_orthnum.prop_vs_uncl
                      -0.9390107                       -0.3932802
           CLASS_orthnum_vs_prop
                      -0.5807882

coefficient 1 (called the intercept) is the unweighted mean of the 5 predictions for all levels of CLASS:

coef(m_03_orthog)[1]

(Intercept)
     7.8455

mean(nd_treatm$PREDS) # mean(nd_orthog$PREDS)

[1] 7.8455

coefficient 2 is the difference of the unweighted mean of the 2 predictions for closed and open on the one hand and the unweighted mean of the 3 predictions for number and propname and unclear on the other, and the fact that it is significant says our first hypothesis is confirmed:

summary(m_03_orthog)$coefficients[2,]

     Estimate    Std. Error       t value      Pr(>|t|)
-6.424391e-01  5.029463e-02 -1.277351e+01  6.512723e-37

mean(nd_orthog$PREDS[c(1,3)]) - mean(nd_orthog$PREDS[c(2,4,5)])

[1] -0.6424391

coefficient 3 is the difference of the prediction for closed and the prediction for open, and the fact that it is significant says our second hypothesis is confirmed:

summary(m_03_orthog)$coefficients[3,]

    Estimate   Std. Error      t value     Pr(>|t|)
 -0.93901065   0.01960634 -47.89320883   0.00000000

mean(nd_orthog$PREDS[1]) - mean(nd_orthog$PREDS[3])

[1] -0.9390107

coefficient 4 is the difference of the unweighted mean of the 2 predictions for number and propname on the one hand and the prediction for unclear on the other, and the fact that it is significant says our third hypothesis is not confirmed:

summary(m_03_orthog)$coefficients[4,]

    Estimate   Std. Error      t value     Pr(>|t|)
-0.393280161  0.123017647 -3.196941003  0.001395713

mean(nd_orthog$PREDS[5]) - mean(nd_orthog$PREDS[c(2,4)])

[1] -0.3932802

coefficient 5 is the difference of the prediction for number and the prediction for propname, and the fact that it is significant says our fourth hypothesis is not confirmed:

summary(m_03_orthog)$coefficients[5,]

     Estimate    Std. Error       t value      Pr(>|t|)
-5.807882e-01  9.499252e-02 -6.114042e+00  1.028827e-09

mean(nd_orthog$PREDS[2]) - mean(nd_orthog$PREDS[4])

[1] -0.5807882

So yes, this comes with some work (of setting up the contrast matrix), but then it’s incredibly elegant: the summary output answers all your questions, or addresses all your hypotheses, and it does that even though the hypotheses involve multiple conflations etc. that would otherwise be very difficult to do. This way of defining orthogonal contrasts can streamline analyses a lot.

1.3.3 Excursus: what if you don’t follow rule 5?

Let’s quickly see why I personally find rule 5 so important, even if most other references I know don’t use it. Those other references, I think, try to make developing the contrast matrices simpler by using integers for the contrast vectors rather than the decimals/fractions that my rule 5 here implies. Therefore, they would define the contrast vectors not like I did above (repeated here for convenience) …

#                           clos  numb  open  prop  uncl
clo.op_vs_num.prop.unc <- c(+0.6, -0.4, +0.6, -0.4, -0.4)
             clo_vs_op <- c(+0.5,  0  , -0.5,  0  ,  0  )
      num.prop_vs_uncl <- c( 0  , -1/3,  0  , -1/3, +2/3)
           num_vs_prop <- c( 0  , +0.5,  0  , -0.5,  0  )

but like this:

d$CLASS_orth_notrule5 <- d$CLASS
#                           clos  numb  open  prop  uncl
clo.op_vs_num.prop.unc <- c( +3 ,  -2 ,  +3 ,  -2 , -2)
             clo_vs_op <- c( +1 ,   0 ,  -1 ,   0 ,  0)
      num.prop_vs_uncl <- c(  0 ,  -1 ,   0 ,  -1 , +2)
           num_vs_prop <- c(  0 ,  +1 ,   0 ,  -1 ,  0)
(contrasts(d$CLASS_orth_notrule5) <-     # make the following the contrasts of CLASS_orth
   cbind( # the matrix resulting from binding together as columns
      clo.op_vs_num.prop.unc, # all
      clo_vs_op,              # four
      num.prop_vs_uncl,       # contrast
      num_vs_prop))           # vectors

     clo.op_vs_num.prop.unc clo_vs_op num.prop_vs_uncl num_vs_prop
[1,]                      3         1                0           0
[2,]                     -2         0               -1           1
[3,]                      3        -1                0           0
[4,]                     -2         0               -1          -1
[5,]                     -2         0                2           0

This sure looks easier (at first) but that ‘backfires’ – let me show you how. Here are the summary results of the model defined with my contrasts, with rule 5 in place:

summary(m_03_orthog)$coef %>% round(4)

                                 Estimate Std. Error  t value Pr(>|t|)
(Intercept)                        7.8455     0.0299 262.7726   0.0000
CLASS_orthclo.op_vs_num.prop.unc  -0.6424     0.0503 -12.7735   0.0000
CLASS_orthclo_vs_op               -0.9390     0.0196 -47.8932   0.0000
CLASS_orthnum.prop_vs_uncl        -0.3933     0.1230  -3.1969   0.0014
CLASS_orthnum_vs_prop             -0.5808     0.0950  -6.1140   0.0000

And here are the summary results of you go with the ‘easier’ integer-based matrix:

summary(m_03_orthog_notrule5 <- lm(DURATION_l ~ 1 + CLASS_orth_notrule5, data=d))$coef %>% round(4)

                                          Estimate Std. Error  t value Pr(>|t|)
(Intercept)                                 7.8455     0.0299 262.7726   0.0000
CLASS_orth_notrule5clo.op_vs_num.prop.unc  -0.1285     0.0101 -12.7735   0.0000
CLASS_orth_notrule5clo_vs_op               -0.4695     0.0098 -47.8932   0.0000
CLASS_orth_notrule5num.prop_vs_uncl        -0.1311     0.0410  -3.1969   0.0014
CLASS_orth_notrule5num_vs_prop             -0.2904     0.0475  -6.1140   0.0000

The reason for my preference becomes clear when you realize that all coefficients (but the intercept, the mean of predicted means) differ from each other and how. Here are the predictions of both models again:

nd_orthog

  CLASS_orth    PREDS
1     closed 6.990532
2     number 7.943175
3       open 7.929542
4   propname 8.523963
5    unclear 7.840289

We have seen above that, when you use my contrasts with rule 5, the coefficients in the summary table correspond to exactly the differences between the (means of grouped) means of contrasted levels. But that’s not the case anymore if you use the integer-based coding. Now

the coefficient CLASS_orth_notrule5clo.op_vs_num.prop.unc is -0.1284878, which is exactly ¹/₅ of the real difference, which we know is -0.6424391;
the coefficient CLASS_orth_notrule5clo_vs_op is -0.4695053, which is exactly ¹/₂ of the real difference, which we know is -0.9390107;
the coefficient CLASS_orth_notrule5num.prop_vs_uncl is -0.1310934, which is exactly ¹/₃ of the real difference, which we know is -0.3932802;
the coefficient CLASS_orth_notrule5num_vs_prop is -0.2903941, which is exactly ¹/₂ of the real difference, which we know is -0.5807882.

In other words, using the integers as we did reduces the coefficients by a factor of f, where f is the sum of the absolute values of the types of a contrast vector. Thus,

if you define clo.op_vs_num.prop.unc as c(3, -2, 3, -2, -2), then the sum of the absolute values of the types is 5 and we saw the coefficient for that contrast in the model not following rule 5 is indeed reduced to just ¹/₅th of the real difference, but
if you define clo.op_vs_num.prop.unc as c(0.6, -0.4, 0.6, -0.4, -0.4), then, as per rule 5, the sum of the absolute values of the types is 1 so the coefficient for that contrast in the model following rule 5 is indeed ¹/₁th of the real difference.

Thus, yes, of course you can save mental effort and use the integers, but you then need to invest more mental effort, I think, to reconstruct the real differences from the summary table – better use rule 4 right away.

1.4 Exercises

1.4.1 Exercise 1

A corpus-based study of four varieties of English – a predictor called VARIETY with the 4 levels NigE, BrE, IndE, and AmE (in that order!) – set out to test the hypotheses that

NigE is significantly different from all others combined;
IndE is significantly different from the two inner-circle varieties AmE and BrE combined;
the two inner-circle varieties AmE and BrE are not significantly different.

Formulate the required contrast matrix for this predictor VARIETY.

#                NIG   BRE   INDE  AME
nig_vs_rest <- c(3/4, -1/4, -1/4, -1/4) # contrast 1
 inn_vs_out <- c( 0 ,  1/3, -2/3,  1/3) # contrast 2
 bre_vs_ame <- c( 0 ,  1/2,  0  , -1/2) # contrast 3
(contrasts(d$VARIETY) <- cbind(
      nig_vs_rest, inn_vs_out, bre_vs_ame))

1.4.2 Exercise 2

A corpus-based study of 5 ‘varieties’ of English – a predictor called L1 with 5 levels EN (i.e. native speakers), FR, GE, NO, and SPA (in that order!) – set out to test the hypotheses that

the native speakers are significantly different from all others combined;
the two Germanic L1s NO and GE combined are significantly different from the two Romance L1s FR and SPA combined;
the two Germanic L1s NO and GE are not significantly different from each other;
the two Romance L1s FR and SPA are not significantly different from each other.

Formulate the required contrast matrix for this predictor L1.

contrasts(d$L1) <- cbind(
#                          eng  fren   germ  norw  span
    native_vs_nonnative=c(0.8, -0.2,  -0.2, -0.2, -0.2), # contrast 1
    romance_vs_germanic=c(0  ,  0.5,  -0.5, -0.5,  0.5), # contrast 2
      spanish_vs_french=c(0  , -0.5,   0  ,  0  ,  0.5), # contrast 3
   norwergian_vs_german=c(0  ,  0  ,  -0.5,  0.5,  0  )) # contrast 4

2 Session info

sessionInfo()

R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  compiler  methods
[8] base

other attached packages:
 [1] multcomp_1.4-29 TH.data_1.1-5   MASS_7.3-65     survival_3.8-3
 [5] mvtnorm_1.3-3   emmeans_2.0.1   effects_4.2-4   car_3.1-3
 [9] carData_3.0-5   STGmisc_1.04    Rcpp_1.1.1      magrittr_2.0.4

loaded via a namespace (and not attached):
 [1] sandwich_3.1-1     lattice_0.22-7     lme4_1.1-38        digest_0.6.39
 [5] evaluate_1.0.5     grid_4.5.2         estimability_1.5.1 fastmap_1.2.0
 [9] jsonlite_2.0.0     Matrix_1.7-4       nnet_7.3-20        DBI_1.2.3
[13] Formula_1.2-5      codetools_0.2-20   abind_1.4-8        reformulas_0.4.3.1
[17] Rdpack_2.6.4       cli_3.6.5          mitools_2.4        rlang_1.1.7
[21] rbibutils_2.4      splines_4.5.2      yaml_2.3.12        otel_0.2.0
[25] tools_4.5.2        nloptr_2.2.1       coda_0.19-4.1      minqa_1.2.8
[29] colorspace_2.1-2   boot_1.3-32        zoo_1.8-15         htmlwidgets_1.6.4
[33] insight_1.4.4      xfun_0.55          rstudioapi_0.17.1  knitr_1.51
[37] xtable_1.8-4       htmltools_0.5.9    nlme_3.1-168       survey_4.4-8
[41] rmarkdown_2.30