CASE CONSTRUCTION V_CHANGPOSS AGENT_ACT
Min. : 1.0 ditransitive:200 change :252 Min. :0.00
1st Qu.:100.8 prep_dative :200 no_change:146 1st Qu.:2.00
Median :200.5 NA's : 2 Median :4.00
Mean :200.5 Mean :4.38
3rd Qu.:300.2 3rd Qu.:7.00
Max. :400.0 Max. :9.00
REC_ACT PAT_ACT
Min. :0.00 Min. :0.000
1st Qu.:2.00 1st Qu.:2.000
Median :5.00 Median :4.000
Mean :4.63 Mean :4.407
3rd Qu.:7.00 3rd Qu.:7.000
Max. :9.00 Max. :9.000
Assignment files for modeling case studies
1 <datives.csv>
This data set is concerned with the dative alternation in English, i.e. the question is whether a speaker says [NPAgentJohn] gave [NPRecipientMary] [NPPatienta dead cat] (ditransitive) or [NPAgentJohn] gave [NPPatienta dead cat] [PPto [NPRecipientMary]] (prepositional dative); the question is what the factors are that govern this linguistic/structural choice. This is the corpus-based data set:
These are the variables in this data set (see here for the roxygenized comments):
-
CASE
: a case number (can be ignored); -
CONSTRUCTION
: the response variable encoding which construction a speaker used: ditransitive or prep_dative; -
V_CHANGPOSS
: a predictor encoding whether the verb in the clause encodes a change of possession of the patient from the agent to the recipient(yes, e.g., give or hand) or not (e.g., promise); -
AGENT_ACT
: a predictor encoding how discourse-given the referent of the agent (John in the above example):- 0 means ‘the referent of the agent is completely new to the conversation’;
- 9 means ‘the referent of the agent was mentioned in the immediately preceding clause’;
-
REC_ACT
: a predictor encoding how discourse-given the referent of the recipient (Mary in the above example):- 0 means ‘the referent of the recipient is completely new to the conversation’;
- 9 means ‘the referent of the recipient was mentioned in the immediately preceding clause’;
-
PAT_ACT
: a predictor encoding how discourse-given the referent of the patient (a dead cat in the above example):- 0 means ‘the referent of the patient is completely new to the conversation’;
- 9 means ‘the referent of the patient was mentioned in the immediately preceding clause’.
2 <toingpriming.csv>
This data set is concerned with the to-/-ing alternation in English, i.e. the question is whether a speaker says I like to swim or I like swimming. Native speaker of English were presented two sentences in an experiment, a prime/context sentence that already involved a to-/-ing alternation sentence and a target sentence involving another to-/-ing alternation sentence that the subjects were supposed with regard to its acceptability on a 7-point scale from -3 to +3; this is the experimental data set:
rm(list=ls(all.names=TRUE))
summary(x <- read.delim("_input/toingpriming.csv", stringsAsFactors=TRUE))
CASE RATING CXPREV CXNOW VNOW_PREF
Min. : 1.0 Min. :-3.0000 ing:278 ing:270 ing:280
1st Qu.:139.8 1st Qu.:-1.0000 to :278 to :286 to :276
Median :278.5 Median : 0.0000
Mean :278.5 Mean : 0.3705
3rd Qu.:417.2 3rd Qu.: 2.0000
Max. :556.0 Max. : 3.0000
These are the variables in this data set (see here for the roxygenized comments):
-
CASE
: a case number (can be ignored); -
CONSTRUCTION_PREV
: a predictor encoding whether the prime sentence was a to or an ing construction; -
CONSTRUCTION_NOW
: a predictor encoding whether the target sentence to be rated was a to or an ing construction; -
V_PREF
: a predictor encoding whether the target sentence contained a verb that is know to prefer to-constructions or a verb that is know to prefer ing-constructions ; -
RESPONSE
: the response variable encoding an acceptability judgment by a native speaker whether a subject was produced in a clause (yes) or not (no);- -3 means ‘the speaker considered the sentence completely unacceptable’;
- 0 means ‘the speaker considered the sentence intermediately (un)acceptable’;
- +3 means ‘the speaker considered the sentence perfectly acceptable’.
3 <thirdpers.csv>
This data set is concerned with the third-person singular suffix in diachronic English, specifically whether speakers wrote the old form (giveth) or the newer and now contemporary form (gives). The question is what made letter writers choose which form (because speakers were not consistently using one and the same form even in the same letter). This is the corpus-based data set (based on letters written between 1400 and 1700):
VARIANT AUTH_GEND REC_SAME_GEND CLOSE_FAM VNCPERIOD FIN_SYB
es:1524 female: 784 no :1210 no :1917 P1: 505 no :3953
th:2619 male :3359 yes:2933 yes:2226 P2: 99 yes: 190
P3:1508
P4:1096
P5: 935
FOL_FRIC GRAM
es : 189 no :2867
other:3666 yes:1276
th : 288
These are the variables in this data set (see here for the roxygenized comments):
-
VARIANT
: the response variable: th vs. s; -
AUTH_GEND
: a predictor encoding the sex of the writer of the letter (female vs. male); -
REC_SAME_GEND
: a predictor encoding the sex of the recipient of the letter: the same as that of the writer (yes) or not (no); -
CLOSE_FAM
: a predictor encoding whether the recipient of the letter was a close family member of the writer of the letter (no vs. yes) -
VNCPERIOD
: a predictor encoding the time period: lower numbers indicate earlier times (P1 begins at 1400) and higher numbers indicate later times (P5 ends at 1700); -
FIN_SYB
: a predictor encoding whether the stem of the verb used with a third person singular ends in a sibilant: yes (as in promises) vs. no (as in does); -
FOL_FRIC
: a predictor encoding whether the the word after the verb used with a third person singular begins with an s (as in promises sleeping cats for everyone), a th (as in promises three cats for everyone), or something else (other, as in promises many cats for everyone); -
GRAM
: a predictor encoding whether the verb used with a third person singular is a grammatical verb ( yes, i.e. is a form of be, do, or have) or not (i.e. a lexical verb).