CPSC 65/LING 20: Spring 2022

Schedule | Grading | Labs | Policies | Helpful Links

Introduction

This course will introduce you to a broad range of topics in the area of natural language processing including language modeling, part of speech tagging, machine translation, syntactic parsing, vector semantics, text classification, as well as the application of computational tools to cognitive modeling and psycholinguistics.

Course Goals

By the end of the course you will:

Class Information

Professor: Spencer Caplan
Office: Science Center 262A
Phone: (610) 957-6257
Office Hours:

Lecture time: Tuesdays and Thursdays 11:20am -- 12:35pm
Lecture location: Singer 222

Lab A time: Tuesdays 1:05 -- 2:35pm
Lab B time: Tuesdays 2:45 -- 4:15pm
Lab location (both A and B): Science Center 240

Class discussion board: EdStem

Note to enrolled students: please don't hesitate to contact Prof. Caplan if you are having trouble accessing any of the course resources (Ed, GitHub, etc.)

Textbook

You should not purchase any textbooks this semester. All readings (both required and optional) will be posted to the course website. Many readings come from:

Weekly Schedule

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS     
1

Jan 18

Asynchronous Prep week

Prep Week

Required Reading

Optional


Lab 0
(Welcome)

Jan 20

2

Jan 25

Synchronous Zoom class

Lab 0 (Welcome) due

Introduction

  • history of NLP
  • Unix tools
  • regular Expressions
  • tokenization and normalization

Required Reading

Lecture Slides

Optional


Lab 1
(Counts)

Jan 27

Synchronous Zoom class

3

Feb 01

First in person class!

Lab 1 (Counts) due

Language Modeling

  • probability
  • n-grams
  • smoothing
  • train-dev-test

Required Reading

Lecture Slides

Optional


Lab 2
(Lang Mod)

Feb 03

Add/Drop Deadline (Feb 04)

4

Feb 08

 

Noisy Channel

  • edit distance
  • spelling correction
  • phonology and speech processing
  • cognitive modeling

Required Reading

Lecture Slides

Optional

Feb 10

 
5

Feb 15

Lab 2 (Lang Mod) due

Quiz 1 (in lab)

Guest Lecture: Ryan Budnick (UPenn)

  • Languages other than English

Lecture Slides


Quiz 1
(in lab)

Feb 17

 

Machine Translation

  • word-to-word and noisy-channel approaches
  • evaluation: BLEU
  • EM-algorithm
  • decoding

Required Reading

Lecture Slides

Optional

6

Feb 22

 


Lab 3 Part-1
(MT Pset)

Feb 24

 
7

Mar 01

Lab 3 Part-1 (MT Pset) due

Vector Semantics

  • lexical semantics
  • co-occurence matrices

Lecture Slides

Optional


Lab 3 Part-2
(MT project)

Mar 03

 

Guest Lecture: Jonathan Washington (Swarthmore)

  • Language technology for marginalised languages
 

Mar 08

Spring Break

Mar 10

8

Mar 15

 

Vector Semantics

  • vector comparison
  • re-weighting embeddings
  • sparse and dense embeddings (word2vec)

Required Reading

Lecture Slides

Optional

Mar 17

Lab 3 Part-2 (MT project) due (Mar 18)

POS Tagging

  • word classes and POS
  • HMMs
  • Viterbi

Required Reading

Lecture Slides

9

Mar 22

 


Lab 4
(Word Vectors)

Mar 24

Withdrawal and CR/NC Deadline (Mar 25)

Classification

  • naive bayes
  • logistic regression
  • perceptrons and neural networks

Required Reading

Lecture Slides

10

Mar 29

Lab 4 (Word Vectors) due


Lab 5
(POS)

Mar 31

 
11

Apr 05

 

Guest Lecture: Jordan Kodner (Stony Brook)

  • Deep Learning as Clever Hans

Lecture Slides

Apr 07

 

Parsing

  • constituency structure
  • CFGs and treebanks
  • top-down, bottom-up and CKY parsing
  • PCFGs and statistical parsing
  • Lexicalized parsing and reranking

Required Reading

Lecture Slides

12

Apr 12

Lab 5 (POS) due


Lab 6
(Spam Filter)

Apr 14

 
13

Apr 19

Lab 6 (Spam Filter) due

Quiz 2 (in lab)

 

Apr 21

 

Morphology

  • tolerance principle

Lecture Slides

 

Apr 26

Oral Final Exams

Apr 28

Grading

Your overall grade in the course will be determined as follows:

55% Labs and projects
25% Final exam (oral)
15% Quizzes
5% Participation and Attendance

Labs and Projects

This course features regular lab projects that account for the largest component of your course grade. Lab attendance is expected by all students, unless you have already completed and submitted the lab assignment for the week.

Lab assignments will be assigned during the lab sections on Tuesday (I will typically post things online ahead of time) and will generally be due by 11:00am on Tuesday of the following week (or two weeks for some labs). Most of the labs involve substantial programming and development, so you are strongly encouraged to start early. Do not underestimate how long it may take you to complete the lab projects!

Even if you do not fully complete an assignment, you should submit what you have done to receive partial credit.

How to plan code and think about debugging

You will do substantially more programming in this class than you did during the intro and intermediate sequence. Many labs will involve writing several hundred lines of code (with significantly less starter code than you may be used to). Be prepared, plan out the top-down design ahead of time, and make sure that you understand the concepts from class before you jump right in!

Since many of the labs involve writing NLP tools from scratch, it is very important that you understand how the system is supposed to work before you start. I will typically provide some benchmark to help you figure out whether your code is completed and working as it should be (e.g. the predicted accuracy on some test data). However, questions like "Our code isn't working right!" or "Why am I not getting the accuracy that I expected?" are not useful prompts for me to help you get to the root of the problem. In order to help you grow into independent programmers, I will require that you first implement at least one unit test relating to the problem whose output you can describe to me before you ask me for debugging help.

Programming language

Assignments will presuppose knowledge of python3. You may end up learning some bash scripting and other tools along the way, but you are not expected to know this ahead of time.

Please make sure that each program you turn in has:

I expect that you will be using python3 for all assignments. If you would like to use something different, you need to ask me about it prior to starting the lab.

Remote Collaboration

Since in the beginning of the Spring semester we are forced to operate remotely, you may need to coordinate some setup for remote collaboration between you and your lab partner. I am not requiring any particular technology for remote collaboration, so feel free to use whatever configuration works best for everyone involved, although I link to some potential resources below below.

Pair Programming

You will work together with a pair-programming partner on most lab assignments. You may choose your own programming partner, but I encourage you to think about whether your working styles are mutually compatible before beginning a partnership. If you would rather not choose your own partner, Prof. Caplan will assign a partner for you. Partnerships are a per-assignment commitment, so you can switch partners between labs if you choose to do so. Grades for each project and assigned to the partnership as a whole (i.e. each students gets the same grade for work submitted together)

When working in a programming partnership, you should follow these guidelines:

Policies

Assignment Submissions

You must submit your assignments electronically by pushing to your assigned git repository. You may push your assignment multiple times, and a history of previous submissions will be saved. You are encouraged to push your work regularly.

Extension and Late-Day Policy

To help with cases of minor illnesses, network issues, competing deadlines, transient laziness (🙃), or other short-term time limitations, all students start the course with three “late assignment days” to be used at your discretion, with no questions asked. To use your extra time, you must email Prof. Caplan after you have completed the lab and pushed to your repository. You do not need to inform anyone ahead of time. When you use late time, you should still expect to work on the newly-released lab during the following lab section meeting. I will need to prioritize answering questions related to the current lab assignment. Please be careful in how you use your late-days! Once you use them all up further extensions cannot be granted, even for issues that are not your fault.

Your late days will be counted at the granularity of full days and will be tracked on a per-student (NOT per-partnership) basis. That is, if you turn in an assignment five minutes after the deadline, it counts as using one day. Using a late day counts towards the late days of each student in the programming partnership. In the case in which only one partner has unused late days, that partner’s late days may be used, barring a consistent pattern of abuse (as determined by the professor).

If you feel that you need an extension on an assignment or that you are unable to attend class for two or more meetings due to a medical condition (e.g., extended illness, concussion, hospitalization) or other emergency, you must contact the dean’s office and your instructors. Faculty will coordinate with the deans to determine and provide the appropriate accommodations.

Academic Integrity

Academic honesty is required in all your work. Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone other than your lab-partner; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment or make them publicly available anywhere (e.g. public GitHub repository).

Discussing ideas and approaches to problems with others on a general level is fine (in fact, I encourage you to discuss general strategies with each other), but you should never read anyone else’s code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the assigned readings for the course, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment should be attributed to which sources.

Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook: “Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion.”

The spirit of this policy applies to all course work, including code, homework solutions (e.g., proofs, analysis, written reports), and exams. Please contact me if you have any questions about what is permissible in this course.

Academic integrity is an issue of personal and moral integrity. Don't do something you will regret. This is not just a boilerplate warning: every semester I catch someone cheating. Let's make this be the semester that breaks free of that streak!

Academic Accommodations

If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services (Parrish 113W, 123W) via e-mail at studentdisabilityservices@swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Services website.

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services.

CS65 Home Page


Last updated: April 24, 2022