Stefan Th. Gries
Home
Contact information
Disclaimer
Last updated: 29 April 2024

Teaching at the University of California, Santa Barbara


Ling 105: Predictive modeling in linguistics (Spring 2024)

Syllabus and overview


This course is a selective introduction to predictive modeling applications in linguistics. We start with a one-session intro of predictive modeling with an emphasis on regression modeling, which will provide an overview of several fundamental aspects of survey model formulation, model selection, multifactoriality, and validation. Then, we work our way through a variety of regression modeling applications: linear regression, binary logistic regression, multinomial, and ordinal regression models. Then, one session will be concerned with model diagnostics and, perhaps, model validation. Finally, there will be bit on predictive modeling with classification and regression trees. Like its prerequisite course Ling 104, this course is based on the third edition of my textbook Statistics for linguistics with R: a practical introduction (2021) and uses the open source programming language R .


Downloads for class sessions
(files will be available as appropriate)



Folder for the whole course

Additional files to be added to that folder per session:
For session 01: PDF of slides
For session 02: HTML
For session 03: HTML
For session 04: HTML
For session 05: Markdown/Quarto file and the Google doc
For session 06: Markdown/Quarto file and the Google doc
For session 07: Markdown/Quarto file and the Google doc
For session 08: Markdown/Quarto file and the Google doc
For session 09: Markdown/Quarto file and the Google doc
For session 10: Markdown/Quarto file and the Google doc


Graded assignments



Attendance is not required and will not be monitored. Choose and work on as many assignments from this page as you need to have the sum of their difficulty levels add up to minimally 5 points and send them to the TA by 14 June 2024, 23:59 PDST (no extensions!). Your assignments can be submitted as R scripts, as RMarkdown or Quarto documents, or as R reports and must have the following file name structure: <105_lastname_assignment##.html> (as in <105_smith_assignment02.html>). The assignments will be graded on (i) whether your statistical analysis 'makes sense' (does the code work? did you explore and prepare the data? choose the right method? visualize properly? summarize the findings in a short paragraph properly?) and (ii) the form in which you submit it (on a scale from a haphazardly formatted R script to a nicely formatted HTML knitted from Quarto); students' preparation of the assignments must comply with UCSB's academic integrity principles.


Software



R (from CRAN) (required, ideally at least version 4.3.3). Also, make sure (i) you have the packages car, effects, magrittr, multcomp, plotly, rgl, and tree installed and (ii) all your packages are up to date
RStudio (required, ideally at least version 2023.12.1+402)
LibreOffice (optional, ideally at least version 24.2)