Ling 104, session 09: practice (key)

Author
Affiliation

UC Santa Barbara & JLU Giessen

Published

05 Jan 2025 12-34-56

rm(list=ls(all=TRUE)); library(magrittr)

1 Exercise 1

Each member of a group of 10 subjects is asked to read two sentences, each containing a particular vowel segment, and the pitch level of the vowel is measured in each reading. The results (in arbitrary units) are as follows; you want to test whether the average pitch levels in sentence 1 and sentence 2 are significantly different.

1.1 Hypotheses

The

  • dependent/response variable is the pitch values;
  • independent/predictor variable is SENTENCE: 1 vs. 2.

What are the hypotheses?

  • text hypotheses:
    • H1: The average pitch frequencies in sentence 1 and sentence 2 differ;
    • H0: The average pitch frequencies in sentence 1 and sentence 2 don’t differ;
  • statistical hypotheses:
    • H1: meansentence1 - meansentence2t ≠ 0;
    • H0: meansentence1 - meansentence2 = t = 0.

We create the data:

SENTENCE1 <- setNames(
   c(30,41,34,28,35,39,40,29,27,33), 1:10)
SENTENCE2 <- setNames(
   c(27,36,35,30,38,44,46,31,33,37), 1:10)

Theoretically, you could also make this a ‘proper’ case-by-variable data frame:

d <- data.frame(
       CASE=1:20,
    SUBJECT=rep(1:10, 2),
   SENTENCE=factor(rep(1:2, each=10)),
      PITCH=c(30,41,34,28,35,39,40,29,27,33,27,36,35,30,38,44,46,31,33,37))

1.2 Descriptive stats/visualization

We first describe the data. Since this is a dependent-samples scenario – “[e]ach member of a group of 10 subjects is asked to read two sentences” – we first compute the pairwise differences:

differences <- SENTENCE2-SENTENCE1
# or
# differences <- d$PITCH[d$SENTENCE==2] - d$PITCH[d$SENTENCE==1]

Then, we do a quick plot of the data:

par(mfrow=c(1, 2)) # define two plotting panels
plot(differences, type="h", ylim=c(-7, 7)) # plot the differences vertically
   abline(h=0, lty=2)                      # a dashed line at y=0
plot(1, 1, type="n", axes=FALSE,               # plot 1 point NOT with no axes ...
     xlab="Sentence", xlim=c(1, 2),            # x-axis stuff
     ylab="Pitch measurement", ylim=c(20, 50)) # y-axis stuff
   axis(1, at=1:2); axis(2); grid() # add axes and a grid
   arrows(rep(1, 10), SENTENCE1,                    # draw arrows from x=1 and y=values for SENTENCE1
          rep(2, 10), SENTENCE2,                    # to x=2 and y-values for SENTENCE2
          col=ifelse(differences>0, "blue", "red")) # if the diff is positive, in blue, otherwise in red
   arrows(1, mean(SENTENCE1), # draw an arrow from x=1 and y= average of 1st vector
          2, mean(SENTENCE2), # to x=2 and y= average of 2nd vector
          lwd=3)              # with a bold line
par(mfrow=c(1, 1)) # define one plotting panel

Let’s compute some quick descriptive statistics as well:

mean(SENTENCE1); sd(SENTENCE1)
[1] 33.6
[1] 5.125102
mean(SENTENCE2); sd(SENTENCE2)
[1] 35.7
[1] 5.96378

From the data frame d, you could do this with a nice little anonymous function like this:

with(d,         # with variables from the data frame d
   tapply(      # apply to
      PITCH,    # PITCH
      SENTENCE, # a grouping by SENTENCE
      # and apply to the groups an anonymous ad-hoc function that
      \(af) c("mean"=mean(af), # computes the mean
              "sd"=sd(af))))   # & the standard deviation
$`1`
     mean        sd
33.600000  5.125102

$`2`
    mean       sd
35.70000  5.96378 

Are the differences normally distributed (for a t-test for dependent samples)?

nortest::lillie.test(differences)

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  differences
D = 0.18912, p-value = 0.3946

Good enough, so we can do our t-test for dependent samples.

1.3 Statistical testing

We can either use the vector-based function call for the ‘regular’ t-test for dependent samples, which is probably the best approach, or do a goodness-of-fit test for the mean of differences:

# a t-test for dependent samples comparing the vectors
t.test(SENTENCE1, SENTENCE2, paired=TRUE)

    Paired t-test

data:  SENTENCE1 and SENTENCE2
t = -1.8119, df = 9, p-value = 0.1034
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -4.7218912  0.5218912
sample estimates:
mean difference
           -2.1 
# a t-test for goodness of fit of whether the mean difference is 0
t.test(differences)

    One Sample t-test

data:  differences
t = 1.8119, df = 9, p-value = 0.1034
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.5218912  4.7218912
sample estimates:
mean of x
      2.1 

Note that the t-test for dependent samples cannot be run with the formula method/argument anymore unless one adopts the new approach now using Pair (which then also obviates the need to say paired=TRUE):

t.test(Pair(SENTENCE1, SENTENCE2) ~ 1)

    Paired t-test

data:  Pair(SENTENCE1, SENTENCE2)
t = -1.8119, df = 9, p-value = 0.1034
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -4.7218912  0.5218912
sample estimates:
mean difference
           -2.1 

1.4 Write-up

[Show descriptive statistics and plot.] To determine whether the average pitch levels in sentence 1 and sentence 2 are significantly different, a two-tailed t-test for dependent samples was done, which concluded that the average pitch levels of the sentences are not significantly different from each other (t=1.812, df=9, p=0.1034).

2 Exercise 2

An experiment is performed to test the effect of certain linguistic features on the politeness of two sentences in a particular social context. 15 informants are asked to rate the two sentences on a scale from 1 (very impolite) to 5 (very polite), with the following results. You want to test the hypothesis that sentence 2 is rated as more polite than sentence 1.

2.1 Hypotheses

The

  • dependent/response variable is the politeness rating;
  • independent/predictor variable is SENTENCE: 1 vs. 2.

What are the hypotheses?

  • text hypotheses:
    • H1: The average rating of sentence 2 is higher than that of sentence 1;
    • H0: The average rating of sentence 2 is not higher than that of sentence 1;
  • statistical hypotheses:
    • H1: mediansentence2 - mediansentence1 > 0;
    • H0: mediansentence2 - mediansentence1 ≤ 0.

We create the data:

SENTENCE1 <- setNames(
   c(1,2,1,2,3,2,1,2,3,1,2,1,2,2,1),
   1:15)
SENTENCE2 <- setNames(
   c(3,2,4,3,1,4,1,3,5,3,3,4,1,4,3),
   1:15)

Theoretically, you could also make this a ‘proper’ case-by-variable data frame:

d <- data.frame(
       CASE=1:30,
      RATER=rep(1:15, 2),
   SENTENCE=factor(rep(1:2, each=15)),
     POLITE=c(1,2,1,2,3,2,1,2,3,1,2,1,2,2,1,3,2,4,3,1,4,1,3,5,3,3,4,1,4,3))

2.2 Descriptive stats/visualization

Since this is a dependent-samples scenario – “15 informants are asked to rate the two sentences” – we first create a quick arrows plot of the data:

differences <- SENTENCE2-SENTENCE1
par(mfrow=c(1, 2)) # define two plotting panels
plot(differences, type="h", ylim=c(-4, 4)) # plot the differences vertically
   abline(h=0, lty=2)                      # a dashed line at y=0
plot(1, 1, type="n", axes=FALSE,             # plot 1 point NOT with no axes ...
     xlab="Sentence"         , xlim=c(1, 2), # x-axis stuff
     ylab="Politeness rating", ylim=c(1, 5)) # y-axis stuff
   axis(1, at=1:2); axis(2); grid() # add axes and a grid
   arrows(rep(1, 15), jitter(SENTENCE1),            # draw arrows from x=1 and y=values for SENTENCE1
          rep(2, 15), jitter(SENTENCE2),            # to x=2 and y-values for SENTENCE2
          col=ifelse(differences>0, "blue", "red")) # if the diff is positive, in blue, otherwise in red
   arrows(1, mean(SENTENCE1), # draw an arrow from x=1 and y= average of 1st vector
          2, mean(SENTENCE2), # to x=2 and y= average of 2nd vector
          lwd=3)              # with a bold line
par(mfrow=c(1, 1)) # define one plotting panel

Let’s compute some quick descriptive statistics as well, where we treat POLITE as an ordinal variable (the safer assumption):

median(SENTENCE1); IQR(SENTENCE1)
[1] 2
[1] 1
median(SENTENCE2); IQR(SENTENCE2)
[1] 3
[1] 1.5

From the data frame d, you could do this with a nice little anonymous function like this:

with(d,         # with variables from the data frame d
   tapply(      # apply to
      POLITE,   # POLITE
      SENTENCE, # a grouping by SENTENCE
      # and apply to the groups an anonymous ad-hoc function that
      \(af) c("median"=median(af), # computes the median
              "IQR"=IQR(af))))     # & the interquartile range
$`1`
median    IQR
     2      1

$`2`
median    IQR
   3.0    1.5 

We do a one-tailed Wilcoxon test.

2.3 Statistical testing

wilcox.test(
   SENTENCE1, SENTENCE2, paired=TRUE, # compute a Wilcoxon test of both vectors
   correct=FALSE,                     # no continuity correction
   alternative="less")                # expecting that the median of 1 is < than that of 2

    Wilcoxon signed rank test

data:  SENTENCE1 and SENTENCE2
V = 10.5, p-value = 0.006252
alternative hypothesis: true location shift is less than 0

Note that, like the t-test for dependent samples, the Wilcoxon test can also not be run with the formula method/argument anymore unless one adopts the new Pair approach:

wilcox.test(Pair(SENTENCE1, SENTENCE2) ~ 1,
   correct=FALSE,
   alternative="less")

    Wilcoxon signed rank test

data:  Pair(SENTENCE1, SENTENCE2)
V = 10.5, p-value = 0.006252
alternative hypothesis: true location shift is less than 0

2.4 Write-up

[Show descriptive statistics and plot.] To test the hypothesis that sentence 2 is rated as more polite than sentence 1, a one-tailed Wilcoxon test was done, which concluded that sentence 2 is rated as significantly more polite than sentence 1 (V=10.5, p1-tailed=0.0063).

3 Exercise 3

You want to test whether the numbers of IUs transcribers identify in a recording changes depending on whether the transcribers work on the recording once or twice. Ten transcribers annotated a recording for IUs on Monday and then again on Wednesday. Do the numbers of IUs differ on average?

3.1 Hypotheses

The

  • dependent/response variable is the number of IUs;
  • independent/predictor variable is the TESTDAY: mon vs. wed.

What are the hypotheses?

  • text hypotheses:
    • H1: The average numbers of IUs from Monday and Wednesday differ;
    • H0: The average numbers of IUs from Monday and Wednesday don’t differ;
  • statistical hypotheses:
    • H1: meanMonday - meanWednesdayt ≠ 0;
    • H0: meanMonday - meanWednesday = t = 0.

We create the data:

MONDAY <- setNames(
   c(16,18,15,18,10,12,16,14,16,11), 1:10)
WEDDAY <- setNames(
   c(15,16,13,15,11,11,14,11,15,15), 1:10)

3.2 Descriptive stats/visualization

Since this is a dependent-samples scenario – “[t]en transcribers annotated a recording for IUs on Monday and then again on Wednesday” – we first compute the pairwise differences:

differences <- WEDDAY-MONDAY

Theoretically, you could also make this a ‘proper’ case-by-variable data frame:

d <- data.frame(
        CASE=1:20,
   ANNOTATOR=rep(1:10, 2),
     TESTDAY=factor(rep(c("mon", "wed"), each=10)),
         IUS=c(16,18,15,18,10,12,16,14,16,11,15,16,13,15,11,11,14,11,15,15))

Then, we do a quick plot of the data:

par(mfrow=c(1, 2)) # define two plotting panels
plot(differences, type="h", ylim=c(-4, 4)) # plot the differences vertically
   abline(h=0, lty=2)                      # a dashed line at y=0
plot(1, 1, type="n", axes=FALSE,           # plot 1 point NOT with no axes ...
     xlab="Transcription", xlim=c( 1,  2), # x-axis stuff
     ylab="IU numbers"   , ylim=c(10, 20)) # y-axis stuff
   axis(1, at=1:2); axis(2); grid() # add axes and a grid
   arrows(rep(1, 15), MONDAY,                       # draw arrows from x=1 and y=values for MONDAY
          rep(2, 15), WEDDAY,                       # to x=2 and y-values for WEDDAY
          col=ifelse(differences>0, "blue", "red")) # if the diff is positive, in blue, otherwise in red
   arrows(1, mean(MONDAY), # draw an arrow from x=1 and y= average of 1st vector
          2, mean(WEDDAY), # to x=2 and y= average of 2nd vector
          lwd=3)           # with a bold line
par(mfrow=c(1, 1)) # define one plotting panel

Let’s compute some quick descriptive statistics as well:

mean(MONDAY); sd(MONDAY)
[1] 14.6
[1] 2.796824
mean(WEDDAY); sd(WEDDAY)
[1] 13.6
[1] 1.95505

From the data frame d, you could do this with a nice anonymous function like this:

with(d,         # with variables from the data frame d
   tapply(      # apply to
      IUS,      # IUS
      TESTDAY,  # a grouping by TESTDAY
      # and apply to the groups an anonymous ad-hoc function that
      \(af) c("mean"=mean(af), # computes the mean
              "sd"=sd(af))))   # & the standard deviation
$mon
     mean        sd
14.600000  2.796824

$wed
    mean       sd
13.60000  1.95505 

Are the differences normally distributed (for a t-test for dependent samples)?

nortest::lillie.test(differences)

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  differences
D = 0.3, p-value = 0.01112

No, so we re-compute the descriptive statistics now on the ordinal level and we do a Wilcoxon-test (but now two-tailed); as before, the Wilcoxon test cannot be run with the formula method/argument anymore unless you use Pair.

3.3 Statistical testing

median(MONDAY); IQR(MONDAY)
[1] 15.5
[1] 3.5
median(WEDDAY); IQR(WEDDAY)
[1] 14.5
[1] 3.5
wilcox.test(
   MONDAY, WEDDAY, paired=TRUE, # compute a Wilcoxon test of both vectors
   correct=FALSE)               # no continuity correction

    Wilcoxon signed rank test

data:  MONDAY and WEDDAY
V = 42.5, p-value = 0.1226
alternative hypothesis: true location shift is not equal to 0
# wilcox.test(Pair(MONDAY, WEDDAY) ~ 1,correct=FALSE) # same as previous

3.4 Write-up

[Show descriptive statistics and plot.] To test whether the numbers of IUs transcribers identify in a recording changes depending on whether the transcribers work on the recording once or twice, a t-test for dependent samples was considered, but was not permitted because, according to a Lilliefors test, the pairwise differences are not normally distributed (p=0.011). Instead, a two-tailed Wilcoxon test was done, which concluded that the numbers of IUs identified do not vary significantly between the two conditions (V=42.5, p=0.1226).

4 Exercise 4

We have language proficiency scores from two groups of subjects; you want to test whether the means differ significantly.

4.1 Hypotheses

The

  • dependent/response variable is language proficiency score;
  • independent/predictor variable is SUBJGROUP: a vs. b.

What are the hypotheses?

  • text hypotheses:
    • H1: The average scores of the two groups differ;
    • H0: The average scores of the two groups don’t differ;
  • statistical hypotheses:
    • H1: meanA - meanBt ≠ 0;
    • H0: meanA - meanB = t = 0.

We create the data:

grp_A <- c(41,58,62,51,48,34,64,50,53,60,44)
grp_B <- c(38,40,64,47,51,49,32,44,61)

Theoretically, you could also make this a ‘proper’ case-by-variable data frame:

d <- data.frame(
       CASE=1:20,
  SUBJGROUP=factor(rep(c("a", "b"), c(11, 9))),
   LANGPROF=c(41,58,62,51,48,34,64,50,53,60,44,38,40,64,47,51,49,32,44,61))

4.2 Descriptive stats/visualization

We do a quick plot of the data and compute summary statistics for both groups:

boxplot(grp_A, grp_B,   # generate a boxplot for each group
        notch=TRUE)     # with notches

c(summary(grp_A), "sd"=sd(grp_A))
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd
34.000000 46.000000 51.000000 51.363636 59.000000 64.000000  9.330303 
c(summary(grp_B), "sd"=sd(grp_B))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       sd
32.00000 40.00000 47.00000 47.33333 51.00000 64.00000 10.41633 

From the data frame d, you could do this with a nice anonymous function like this:

boxplot(d$LANGPROF ~ d$SUBJGROUP, # generate a boxplot for each group
   notch=TRUE,                    # with notches &
   var.width=TRUE)                # reflecting sample sizes

with(d, tapply(LANGPROF, SUBJGROUP, \(af) c(summary(af), "sd"=sd(af))))
$a
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd
34.000000 46.000000 51.000000 51.363636 59.000000 64.000000  9.330303

$b
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       sd
32.00000 40.00000 47.00000 47.33333 51.00000 64.00000 10.41633 

Are the differences normally distributed (for a t-test for independent samples)?

nortest::lillie.test(grp_A)

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  grp_A
D = 0.12518, p-value = 0.9005
nortest::lillie.test(grp_B)

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  grp_B
D = 0.14019, p-value = 0.8774
# or from the data frame d
# tapply(d$LANGPROF, d$SUBJGROUP, nortest::lillie.test)

Good enough, so we test for variance homogeneity:

var.test(grp_A, grp_B)

    F test to compare two variances

data:  grp_A and grp_B
F = 0.80235, num df = 10, denom df = 8, p-value = 0.7298
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1868038 3.0929564
sample estimates:
ratio of variances
          0.802346 
# or from the data frame d
# var.test(d$LANGPROF ~ d$SUBJGROUP)

Good enough, too, so we can do our t-test for independent samples.

4.3 Statistical testing

t.test(grp_A, grp_B)

    Welch Two Sample t-test

data:  grp_A and grp_B
t = 0.90189, df = 16.323, p-value = 0.3802
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.427764 13.488370
sample estimates:
mean of x mean of y
 51.36364  47.33333 
# or from the data frame d
# t.test(d$LANGPROF ~ d$SUBJGROUP)

4.4 Write-up

[Show descriptive statistics and plot.] To test whether the average group scores on the language proficiency test differ, a t-test for independent samples was done, which concluded that there is no significant difference between the average scores (t=0.902, df=16.323, p=0.3802).

5 Exercise 5

20 adult learners of Swahili are divided into two groups of 10 at random. Group A is taught by a grammar-translation method, group B by an audio-lingual method. At the end of the course, the two groups obtain the following scores on a proficiency test. You want to test whether group B performs significantly better on average.

5.1 Hypotheses

The

  • dependent/response variable is the proficiency score;
  • independent/predictor variable SUBJGROUP: a vs. b.

What are the hypotheses?

  • text hypotheses:
    • H1: The average score of group B is higher than that of group A;
    • H0: The average score of group B is not higher than that of group A;
  • statistical hypotheses:
    • H1: meanB - meanA > t > 0;
    • H0: meanB - meanA = t ≤ 0.

We create the data:

grp_A <- c(45,58,60,51,53,59,54,40,56,56)
grp_B <- c(48,58,71,56,59,62,64,62,52,69)

Theoretically, you could also make this a ‘proper’ case-by-variable data frame:

d <- data.frame(
        CASE=1:20,
   SUBJGROUP=factor(rep(c("a", "b"), each=10)),
   PROFSCORE=c(45,58,60,51,53,59,54,40,56,56,48,58,71,56,59,62,64,62,52,69))

5.2 Descriptive stats/visualization

We do a quick plot of the data and compute summary statistics for both groups:

boxplot(grp_A, grp_B,   # generate a boxplot for each group
        notch=TRUE)     # with notches &

c(summary(grp_A), "sd"=sd(grp_A))
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd
40.000000 51.500000 55.000000 53.200000 57.500000 60.000000  6.373556 
c(summary(grp_B), "sd"=sd(grp_B))
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd
48.000000 56.500000 60.500000 60.100000 63.500000 71.000000  7.109462 

From the data frame d, you could do this like this:

boxplot(d$PROFSCORE ~ d$SUBJGROUP, # generate a boxplot for each group
        notch=TRUE)                # with notches

with(d, tapply(PROFSCORE, SUBJGROUP, \(af) c(summary(af), "sd"=sd(af))))
$a
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd
40.000000 51.500000 55.000000 53.200000 57.500000 60.000000  6.373556

$b
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd
48.000000 56.500000 60.500000 60.100000 63.500000 71.000000  7.109462 

Are the differences normally distributed (for a t-test for independent samples)?

nortest::lillie.test(grp_A)

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  grp_A
D = 0.18748, p-value = 0.4084
nortest::lillie.test(grp_B)

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  grp_B
D = 0.10536, p-value = 0.9892
# or from the data frame d
# tapply(d$PROFSCORE, d$SUBJGROUP, nortest::lillie.test)

Good enough, so we test for variance homogeneity.

var.test(grp_A, grp_B)

    F test to compare two variances

data:  grp_A and grp_B
F = 0.80369, num df = 9, denom df = 9, p-value = 0.7501
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.199626 3.235664
sample estimates:
ratio of variances
         0.8036931 
# or from the data frame d
# var.test(d$PROFSCORE ~ d$SUBJGROUP)

Good enough, too, so we can do our t-test for independent samples (now one-tailed).

5.3 Statistical testing

t.test(grp_A, grp_B,       # compute t-test for independent samples
       alternative="less") # grp_A is expected to be < grp_B

    Welch Two Sample t-test

data:  grp_A and grp_B
t = -2.2852, df = 17.789, p-value = 0.0174
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -1.660838
sample estimates:
mean of x mean of y
     53.2      60.1 
# or from the data frame d
# t.test(d$PROFSCORE ~ d$SUBJGROUP, alternative="less")

5.4 Write-up

[Show descriptive statistics and plot.] To test whether the average scores for the teaching methods differ as expected (audio-lingual > grammar translation), a one-tailed t-test for independent samples was done, which concluded that there is indeed a significant difference between the average scores in the predicted direction (t=-2.285, df=17.789, p1-tailed=0.0174).

6 Exercise 6

A study counted the frequencies of inversion of subject and verb after an introductory adverbial in declarative affirmative clauses in three texts for each of two time periods. You want to test for each time period whether the texts differ with regard to their frequencies of inversions. Here are the two tables with the frequencies for each time period:

6.1 Hypotheses

The

  • dependent/response variable is INVERSION;
  • independent/predictor variable is TEXT (for each time period separately).

What are the hypotheses?

  • text hypotheses for early:
    • H1: The frequencies of inversions no and yes differ across the three texts in the early time period;
    • H0: The frequencies of inversions no and yes don’t differ across the three texts in the early time period;
  • statistical hypotheses for early:
    • H1: Χ2>0;
    • H0: Χ2=0.
  • text hypotheses for late:
    • H1: The frequencies of inversions no and yes differ across the three texts in the early time period;
    • H0: The frequencies of inversions no and yes don’t differ across the three texts in the early time period;
  • statistical hypotheses for late:
    • H1: Χ2>0;
    • H0: Χ2=0.
Early Inversion: no Inversion: yes
Text 1 27 11
Text 2 34 16
Text 3 34 14
Late Inversion: no Inversion: yes
Text 1 109 27
Text 2 61 11
Text 3 49 29

We input them into R:

early <- matrix(           # define and show a matrix 
   c(27,11,34,16,34,14),   # with these frequencies
   byrow=TRUE, ncol=2,     # defined row-wise with 2 columns
   dimnames=list(TEXT=1:3,                  # and these row names
                 INVERSION=c("no", "yes"))) # and these column names
late <- matrix(            # define and show a matrix 
   c(109,27,61,11,49,29),  # with these frequencies
   byrow=TRUE, ncol=2,     # defined row-wise with 2 columns
   dimnames=list(TEXT=1:3,                  # and these row names
                 INVERSION=c("no", "yes"))) # and these column names

6.2 Descriptive stats/visualization

We describe the data:

addmargins(early)
     INVERSION
TEXT  no yes Sum
  1   27  11  38
  2   34  16  50
  3   34  14  48
  Sum 95  41 136
early %>% t %>% mosaicplot(shade=TRUE)

addmargins(late)
     INVERSION
TEXT   no yes Sum
  1   109  27 136
  2    61  11  72
  3    49  29  78
  Sum 219  67 286
late  %>% t %>% mosaicplot(shade=TRUE)

6.3 Statistical testing

Here’s the test for the early data:

(test_early <- chisq.test(early, correct=FALSE))

    Pearson's Chi-squared test

data:  early
X-squared = 0.1294, df = 2, p-value = 0.9373

Looks like there’s nothing at all. Were we allowed to do a chi-squared test?

test_early$expected
    INVERSION
TEXT       no      yes
   1 26.54412 11.45588
   2 34.92647 15.07353
   3 33.52941 14.47059

Yes, so we move on by computing the residuals and an effect size.

test_early$residuals # compute the residuals
    INVERSION
TEXT          no        yes
   1  0.08848479 -0.1346910
   2 -0.15676687  0.2386295
   3  0.08126960 -0.1237081
sqrt(test_early$statistic /               # compute an effect size ...
        (sum(early)*(min(dim(early))-1))) # ... for this table
 X-squared
0.03084586 

And here’s the test for the later data:

(test_late <- chisq.test(late, correct=FALSE))

    Pearson's Chi-squared test

data:  late
X-squared = 11.858, df = 2, p-value = 0.002662

Looks like there’s now an effect but again: Were we allowed to do a chi-squared test?

test_late$expected
    INVERSION
TEXT        no      yes
   1 104.13986 31.86014
   2  55.13287 16.86713
   3  59.72727 18.27273

Yes, so we move on by computing the residuals and an effect size:

test_late$residuals # compute the residuals
    INVERSION
TEXT         no        yes
   1  0.4762558 -0.8610432
   2  0.7901702 -1.4285824
   3 -1.3880432  2.5095025
sqrt(test_late$statistic /              # compute an effect size ...
        (sum(late)*(min(dim(late))-1))) # ... for this table
X-squared
0.2036185 

6.4 Write-up

For each time period, a chi-squared test was done (with all expected frequencies being greater than 5). For the early data, the texts did not differ significantly (chi-squared=0.1294, df=2, p=0.9373, Cramer’s V=0.03), but for the late time period, there was a significant effect (chi-squared=11.858, df=2, p=0.0027, Cramer’s V=0.204): inversion was notably overrepresented in text 3, but underrepresented in the other two texts. [Show plot for late data.]

7 Exercise 7

A panel of teachers is asked to grade the reading and writing abilities of 15 children on a scale from 1 (very poor) to 7 (excellent), with the results given below. You want to calculate an appropriate correlation coefficient and test it for significance.

7.1 Hypotheses

The

  • dependent/response variable is one of the two sets of scores (READING or WRITING);
  • independent/predictor variable is the other of the two sets of scores (the other one).

What are the hypotheses?

  • text hypotheses:
    • H1: there is a correlation between the reading and the writing scores;
    • H0: there is no correlation between the reading and the writing scores;
  • statistical hypotheses:
    • H1: some correlation coefficient is ≠0;
    • H0: some correlation coefficient is 0.
READING <- setNames(
   c(3,4,5,6,3,4,3,5,2,4,7,5,6,2,3), 1:15)
WRITING <- setNames(
   c(6,4,3,7,4,2,5,6,3,5,3,4,4,3,2), 1:15)

You could also put these two columns into a data frame d, but here, nothing much is gained from that; you would still use the column names just like you use the vector names.

7.2 Descriptive stats/visualization

Let’s plot the data:

par(mfrow=c(1, 2)) # define two plotting panels
plot(WRITING ~ READING); grid()     # plot WRITING (y-axis) as a function of READING (x-axis)
   lines(lowess(WRITING ~ READING)) # add a locally-weighted smoother
plot(READING ~ WRITING); grid()     # plot READING (y-axis) as a function of WRITING (x-axis)
   lines(lowess(READING ~ WRITING)) # add a locally-weighted smoother
par(mfrow=c(1, 1)) # define one plotting panel

7.3 Statistical testing

Since this is a correlation scenario that probably involves ordinal variables – “to grade the reading and writing abilities of 15 children” – we compute two rank correlation coefficients:

cor.test(READING, WRITING,  # compute a correlation coefficient
         method="spearman") # namely spearman's rho

    Spearman's rank correlation rho

data:  READING and WRITING
S = 436.77, p-value = 0.4307
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho
0.2200566 
cor.test(READING, WRITING, # compute a correlation coefficient
         method="kendall") # namely Kendall's tau

    Kendall's rank correlation tau

data:  READING and WRITING
z = 0.77764, p-value = 0.4368
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau
0.1657484 

It might have been reasonable to actually adopt a directional H1 here but that of course makes no difference here:

cor.test(READING, WRITING,      # compute a correlation coefficient
         method="spearman",     # namely spearman's rho
         alternative="greater") # expect a positive corr coeff

    Spearman's rank correlation rho

data:  READING and WRITING
S = 436.77, p-value = 0.2153
alternative hypothesis: true rho is greater than 0
sample estimates:
      rho
0.2200566 
cor.test(READING, WRITING,      # compute a correlation coefficient
         method="kendall",      # namely Kendall's tau
         alternative="greater") # expect a positive corr coeff

    Kendall's rank correlation tau

data:  READING and WRITING
z = 0.77764, p-value = 0.2184
alternative hypothesis: true tau is greater than 0
sample estimates:
      tau
0.1657484 

7.4 Write-up

[Show plot(s).] To test whether the reading and writing abilities of 15 children are correlated, Spearman’s ρ and Kendall’s τ were computed, but neither was significant (ρ=0.22, S=436.77, p=0.431 and τ=0.166, z=0.778, p=0.437).

8 Homework

Analyze the Chevrolet promotion data so that you can present your code and results in class; the question is whether the data in _input/chevyprom1.csv (see _input/chevyprom1.r) support the claim that Chevrolet discriminated against women when it came to promoting staff?

9 Session info

sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  compiler  methods
[8] base

other attached packages:
[1] STGmisc_1.0    Rcpp_1.0.14    magrittr_2.0.3

loaded via a namespace (and not attached):
 [1] digest_0.6.37     fastmap_1.2.0     xfun_0.51         nortest_1.0-4
 [5] knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29    cli_3.6.4
 [9] rstudioapi_0.17.1 tools_4.4.3       evaluate_1.0.3    yaml_2.3.10
[13] rlang_1.1.5       jsonlite_1.9.1    htmlwidgets_1.6.4 MASS_7.3-65