Ling 104, session 04: descr. stats 2 (key)
1 Exercise 14
Ten bilingual students (English/German) took one dictation in English and one in German. They made the following numbers of mistakes in English and German respectively:
- Compute a measure of correlation to quantify the association between the numbers of errors.
- Illustrate the correlation in a graph and interpret the results (in one sentence).
Note: the assignment says “the correlation”, which is bidirectional: cor(a, b) is the same as cor(b, a). But for plotting, you need to decide what to put on the y-axis, which is traditionally reserved for the response variable. The safest way is therefore to plot both directions:
op <- par(mar=c(4, 4, 2, 1)) # customize the plotting margins
par(mfrow=c(1,2)) # make the plotting window have 1 row & 2 columns
plot(ENGLISH ~ GERMAN, # plot English as a function of German mistakes
xlab="German dictation" , xlim=c(0, 35), # x-axis stuff
ylab="English dictation", ylim=c(0, 35)) # y-axis stuff
abline(lm(ENGLISH ~ GERMAN), col="blue"); grid() # add regression line in blue & a grid
plot(GERMAN ~ ENGLISH, # plot German as a function of English mistakes
xlab="English dictation", xlim=c(0, 35), # x-axis stuff
ylab="German dictation" , ylim=c(0, 35)) # y-axis stuff
abline(lm(GERMAN ~ ENGLISH), col="blue"); grid() # add regression line in blue & a grid
par(op) # reset to defaultsThere is a moderate negative correlation (r≈-0.461): higher values of mistakes in one dictation are correlated with lower values of mistakes in the other.
2 Exercise 15
Compute the number of mistakes expected from a student in the German dictation, if that student made 12 mistakes in the English dictation.
1
25.14358
plot(GERMAN ~ ENGLISH, # plot German as a function of English mistakes
xlab="English dictation", xlim=c(0, 35), # x-axis stuff
ylab="German dictation" , ylim=c(0, 35)) # y-axis stuff
abline(lm(GERMAN ~ ENGLISH), col="blue"); grid() # add regression line in blue & a grid
# draw the vertical & horizontal dashed lines
segments(12, par("usr")[1], 12, 25.14, col="blue", lty=2) # vertical
segments(12, 25.14, par("usr")[3], 25.14, col="blue", lty=2) # horizontal3 Exercise 16
Now you also obtained the sexes of the students: students 2 to 6 were girls, the rest boys.
- Enter this into R.
- Compute the average numbers of errors in the German dictation for boys and girls.
f m
23.8 21.4
- Represent the numbers of mistakes in the German dictation as a function of the sex of the students graphically.
4 Exercise 17
Standardize the numbers of errors in both dictations.
[1] 1.6497080 0.1346700 -1.5487055 -0.5386802 -1.2120304 -0.7070177
[7] 0.9763578 0.4713451 0.1346700 0.6396827
[1] -0.3515752 -0.7910443 1.1865664 1.1865664 0.7470974 -1.0107788
[7] -1.4502479 -0.1318407 -0.5713098 1.1865664
5 Exercise 18
50 students took a statistics exam, 80% passed. What is the 95%-confidence interval for this result?
round(binom.test( # round the probabilities of a binomial test for
x=40, # this number of successes, here passes
n=50, # this number of trials, here the 50 students
conf.level=0.95)$ # the confidence level we want; 0.95 = default
conf.int, 3) # return only the confidence interval, round to 3 decimals[1] 0.663 0.900
attr(,"conf.level")
[1] 0.95
But of course you could also use a (percentile) bootstrapping approach, which returns results that are fairly comparable to the binom.test results (and that are also fairly comparable to those of a more advanced bootstrapping approach):
collector <- rep(NA, 2000) # set up a collector vector
set.seed(123); for (i in 1:2000) { # do something 2K times, namely
RESULTS_sampled <- sample(
c("fail", "pass"),
prob=c( 0.2 , 0.8),
size=50, replace=TRUE)
collector[i] <- sum(RESULTS_sampled=="pass")/50
}
quantile(collector, probs=c(0.025, 0.975)) # extract 'CI' 2.5% 97.5%
0.68 0.90
6 Exercise 19
Load the file _input/partplacement.csv into a data frame d. This file contains data from a corpus study on the alternation of particle placement that was introduced in Section 1.3; you can find information about this data set in _input/partplacement.r.
CASE CONSTRUCTION MEDIUM DO_COMPLX DO_LENSYLL
Min. : 1.00 v_do_prt:100 spoken :100 clausmod: 6 Min. : 1.00
1st Qu.: 50.75 v_prt_do:100 written:100 phrasmod: 67 1st Qu.: 2.00
Median :100.50 simple :127 Median : 3.00
Mean :100.50 Mean : 4.72
3rd Qu.:150.25 3rd Qu.: 6.00
Max. :200.00 Max. :31.00
DO_ANIM DO_CONC PP
animate : 27 abstract: 95 no :167
inanimate:173 concrete:105 yes: 33
7 Exercise 20
Represent the correlation between the choice of construction and the complexity of the direct object graphically.
op <- par(mar=c(4, 4, 2, 1)) # customize the plotting margins
mosaicplot( # show a mosaic plot
main="", # w/out a main heading
x=table( # of the table of
d$DO_COMPLX, # DO_COMPLX &
d$CONSTRUCTION), # CONSTRUCTION
col=c("grey35", "grey75")) # w/ these colors
par(op) # reset to defaults
# plot(d$DO_COMPLEXITY, d$CONSTRUCTION) # or
# plot(table(d$DO_COMPLEXITY, d$CONSTRUCTION)) # or
# plot(d$CONSTRUCTION ~ d$DO_COMPLEXITY)8 Exercise 21
Create a table representing the correlation between the choice of construction and the complexity of the direct object and briefly summarize the result.
v_do_prt v_prt_do
clausmod 1 5
phrasmod 9 58
simple 90 37
CONSTRUCTION
DO_COMPLX v_do_prt v_prt_do
clausmod 1 5
phrasmod 9 58
simple 90 37
CONSTRUCTION
DO_COMPLX v_do_prt v_prt_do
clausmod 1 5
phrasmod 9 58
simple 90 37
Simple direct object prefer the construction where the particle follows the direct object, but phrasally and clausally modified objects prefer the construction where the particle precedes the direct object. (Note: This is not yet a significance test so it is not clear yet whether this preference is in fact significant.)
9 Exercise 22
Represent the correlation between the choice of construction and the length of the direct object graphically and briefly summarize the result.
This would be the best solution:
The shorter the direct object, the more the particle gets positioned behind the direct object. (Note: This is not yet a significance test so it is not clear yet whether this preference is in fact significant.)
10 Exercise 23
Investigate whether the choice of construction depends on the animacy of the referent of the direct objects and the presence/absence of a directional prepositional phrase.
This question is in fact ambiguous: did I mean
- did I mean ’whether
CONSTRUCTIONvaries as- a function of
DO_ANIMand separately - a function of
PP’?
- a function of
- or did I mean ‘whether
CONSTRUCTIONvaries as a function ofDO_ANIMandPPtogether’?
The answers to these questions are different; this is what you would do for 1a) and 1b):
animate inanimate
v_do_prt 19 81
v_prt_do 8 92
no yes
v_do_prt 75 25
v_prt_do 92 8
And this is what you would do for 2):
, , = no
animate inanimate
v_do_prt 13 62
v_prt_do 7 85
, , = yes
animate inanimate
v_do_prt 6 19
v_prt_do 1 7
Although this would be better:
v_do_prt v_prt_do
animate no 13 7
yes 6 1
inanimate no 62 85
yes 19 7
And this might be even better:
v_do_prt v_prt_do
animate no 0.650 0.350
yes 0.857 0.143
inanimate no 0.422 0.578
yes 0.731 0.269
But this one would actually benefit from the pipe, I think, both in terms of brevity/readability and in terms of the output:
11 Homework
To prepare for next week, read (and work through!) SFLWR3: Section 4.1.
12 Session info
R version 4.5.2 Patched (2025-11-09 r88994 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets compiler methods
[8] base
other attached packages:
[1] STGmisc_1.06 Rcpp_1.1.1 magrittr_2.0.4
loaded via a namespace (and not attached):
[1] digest_0.6.39 fastmap_1.2.0 xfun_0.57 knitr_1.51
[5] htmltools_0.5.9 rmarkdown_2.30 cli_3.6.5 rstudioapi_0.18.0
[9] tools_4.5.2 evaluate_1.0.5 yaml_2.3.12 otel_0.2.0
[13] htmlwidgets_1.6.4 rlang_1.1.7 jsonlite_2.0.0 MASS_7.3-65