The following is a list of course-final assignments – choose one from it and send your answers and the code you used to answer the questions to Albert at by 24 March 2023, 15:00 PST.

1 Assignment 1: Twistin’ the night away

Jackendoff (1997) investigates what he calls the ‘time’-away construction, which is exemplified by utterances such as Bill slept the afternoon away, We’re twisting the night away, or The two of them would drink the week away.” Try to generate a regular expression that can retrieve such instances from the SGML-annotated version of the BNC. To simplify things, look for sequences of a verb, followed by the (with its tag), followed by a singular or plural form of any of the nouns morning, noon, afternoon, evening, night, day, week, month, or year, followed by away tagged as an adverb.

How many instances are there? What is the distribution of the time nouns, what time scale do most of them refer to, and what is the semantic implication of this construction?

2 Assignment 2: VERB into VERBing

Another partially lexically-filled argument structure construction in English is the so-called into-causative:

  1. HeX tricked [me]Y into [buying something]Z.
  2. HeX talked [the others]Y into [voting for them]Z.
  3. SUBJX V OBJY into V-ingZ

How many occurrences of the into-causative can you find in the SGML files representing the British National Corpus? Use tags and I recommend you use them to look for into (tagged as a preposition) followed by something that ends in ing (tagged as a verb). Also, answer the following questions:

  • Does this construction have a positive or negative semantic ‘touch’?
  • Is this construction associated with particular semantic frames or typical situations?
  • What are the kinds of verbs that occur as verb 1 (trick in the above example (1))? Give examples and, if possible, group them into classes.

3 Assignment 3: CAUSE

What is the semantic prosody of the (verb and/or noun) lemma CAUSE? Specifically, does CAUSE go with largely positive, neutral, or negative collocates and with which ones in particular? Answer this question by exploring and discussing only the noun and verb (but not forms of BE, DO, HAVE, or modal verbs) collocates of CAUSE in close proximity of CAUSE (4 words to the left and right) in the 4 BNC SGML files.

4 Assignment 4: Thematic concentration

It has been proposed that the ‘thematic concentration’ of a corpus of files, or of just one file can be quantified using the logic underlying the h-index, which is often used to quantify academics’ scholarly ‘productivity and impact’. The ‘thematic concentration’ of a file can be computed by

  • generating a frequency list of the file;
  • assigning frequency-based ranks to the word types such that
    • the most frequent word is ranked 1,
    • the second most frequent word is ranked 2, etc.
    • ties are broken with the average;
  • determining for each table the proportion of word types in the corpus file whose frequency is greater than or equal to their ranks.

(By analogy, for academics’ scholarly ‘productivity and impact’, Google Scholar defines the h-index as the largest number h such that h publications have at least h citations.)

Your task is to look at the ICE-GB corpus and find the spoken corpus file and the written corpus file with the highest thematic concentration, i.e. the file with the largest proportion of word types that have a frequency greater than their ranks.