CPSC 65/LING 20: Fall 2020

Schedule | Grading | Labs | Policies | Helpful Links

Introduction

This course will introduce you to a broad range of topics in the area of natural language processing including language modeling, part of speech tagging, syntactic parsing, vector semantics, text classification, machine translation, and applications to cognitive modeling and psycholinguistics.

Course Goals

By the end of the course you will:

Class Information

Professor: Spencer Caplan
Office: Science Center 258
Phone: (610) 328-8272
Office Hours: T 3:30 -- 4:30pm; T 6:30 -- 7:30pm; W 9:00 -- 10:00am; or by appointment
OH location: Zoom

Lecture time: Tuesday and Thursday 2:00 -- 3:15pm
Lecture location: Zoom

Lab time: Wednesday 1:15 -- 2:45pm and 3:00 -- 4:30pm
Lab location: Start on Zoom and move to Slack

Class discussion board: Piazza

Note to enrolled students: please don't hesitate to contact Prof. Caplan if you are having trouble accessing any of the course resources (Slack, Piazza, etc.)

Textbook

You should not purchase any textbooks this semester. Assigned readings will be posted to the course website and come primarily from:

Tentative schedule

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS     
1

Sep 08

 

Introduction

  • History of NLP
  • Unix tools
  • regular Expressions
  • tokenization and normalization

Required Reading

Lecture Slides

Optional


Lab 1
(Counts)

Sep 10

Add/Drop ends (Sep 14)

2

Sep 15

Lab 1 (Counts) due (Sep 16)

Language Modeling

  • probability
  • n-grams
  • smoothing
  • train-dev-test

Required Reading

Lecture Slides

Optional


Lab 2
(Lang Mod)

Sep 17

 
3

Sep 22

 

Noisy Channel

  • edit Distance
  • spelling correction
  • phonology and speech processing
  • cognitive modeling

Required Reading

Lecture Slides

Optional

Sep 24

 
4

Sep 29

Lab 2 (Lang Mod) due (Sep 30)

Vector Semantics

  • lexical semantics
  • co-occurence matrices
  • vector comparison
  • re-weighting embeddings
  • sparse and dense embeddings (word2vec)

Required Reading

Lecture Slides

Optional


Lab 3
(Word Vectors)

Oct 01

 
5

Oct 06

Lab 3 (Word Vectors) due (Oct 07)

POS Tagging

  • word classes and POS
  • HMMs
  • Viterbi

Required Reading

Lecture Slides


Lab 4
(POS)

Oct 08

 
6

Oct 13

 

Classification

  • naive bayes
  • logistic regression
  • perceptrons and SVMs
  • neural networks

Required Reading

Optional

Lecture Slides

Oct 15

 
7

Oct 20

Lab 4 (POS) due (Oct 21)


Lab 5
(Spam Filter)

Oct 22

 

Parsing

  • constituency structure
  • CFGs and treebanks
  • top-down, bottom-up and CKY parsing
  • PCFGs and statistical parsing

Required Reading

Lecture Slides

8

Oct 27

Lab 5 (Spam Filter) due (Oct 28)


Lab 6
(Parsing)

Oct 29

 
9

Nov 03

Election Day (no class, go vote)

Nov 05

Lab 6 (Parsing) due

CR/NC/W Deadline (Nov 06)

Machine Translation

Required Reading

Optional


Final Projects

10

Nov 10

 

Nov 12

 

Guest Lecture:
Stephen Mayhew (Duolingo)

11

Nov 17

 

TBD

Nov 19

Last "on campus" day (Nov 20)

Guest Lecture:
Jordan Kodner (Stony Brook)

Optional Reading

 

Nov 24

Thanksgiving

Nov 26

12

Dec 01

 

Computational Psycholinguistics

Dec 03

Last day of lectures (Dec 04)

 

Dec 08

Final Presentations

Dec 10

Dec 15

Absolute last day to hand in final projects

Grading

Your overall grade in the course will be determined as follows:

70% Labs and projects
25% Final project / paper
5% Participation and Attendance

Labs and Projects

This course features regular lab projects that account for the largest component of your course grade. Lab attendance is expected by all students, unless you have already completed and submitted the lab assignment for the week.

Lab assignments will typically be assigned during the lab sections on Wednesday and will generally be due by noon on Wednesday of the following week (or two weeks for some labs). Many of the labs involve substantial programming and development, so you are strongly encouraged to start early. Do not underestimate how long it may take you to complete the lab projects!

Even if you do not fully complete an assignment, you should submit what you have done to receive partial credit.

Programming language

Assignments will presuppose knowledge of python3. You may end up learning some bash scripting and other tools along the way, but you are not expected to know this ahead of time.

Please make sure that each program you turn in has:

I expect that you will be using python3 for all assignments. If you would like to use something different, you need to ask me about it prior to starting the lab.

Remote Collaboration

Since CS65 is operating remotely and most students will not have access to the department comptuer labs, you will need to coordinate some setup for remote collaboration between you and your lab partner / team-members. I am not requiring any particular technology for remote collaboration, so feel free to use whatever configuration works best for everyone involved, although I link to some potential resources below below.

Pair Programming

Students will work together with a pair-programming partner on all lab assignments. You may choose your own programming partner, but you are strongly encouraged to form a partnership with someone in the same or a nearby timezone as you. If you opt out of selecting your own parner, Prof. Caplan will assign a partner to you via random script. Partnerships are a per-assignment commitment. So you can switch partners between labs if you choose to do so. Grades for each project and assigned to the partnership as a whole (i.e. each students gets the same grade for work submitted together)

When working in a programming partnership, you should follow these guidelines:

International Student Lab Accommodations

While the general expectation is that students participate synchronously in lectures and labs, I acknowledge that some students are in significantly different timezones from Swarthmore. If your lab section time occurs between 11pm and 8am in your local timezone, you may alternatively attend an additional block of office hours (Wednesdays 9:00 -- 10:00am) designed to accomodate such timezone issues. If you believe this accomodation applies to you, you need to contact Prof. Caplan.

Policies

Assignment Submissions

You must submit your assignments electronically by pushing to your assigned git repository. You may push your assignment multiple times, and a history of previous submissions will be saved. You are encouraged to push your work regularly.

Extension and Late-Day Policy

To help with cases of minor illnesses, network issues, or other short-term time limitations, all students start the course with three “late assignment days” to be used at your discretion, with no questions asked. To use your extra time, you must email Prof. Caplan after you have completed the lab and pushed to your repository. You do not need to inform anyone ahead of time. When you use late time, you should still expect to work on the newly-released lab during the following lab section meeting. The professor will always prioritize answering questions related to the current lab assignment. Please be careful in how you use your late-days! Once you use them all up further extensions cannot be granted, even for issues that are not your fault.

Your late days will be counted at the granularity of full days and will be tracked on a per-student (NOT per-partnership) basis. That is, if you turn in an assignment five minutes after the deadline, it counts as using one day. Using a late day counts towards the late days of each student in the programming partnership. In the case in which only one partner has unused late days, that partner’s late days may be used, barring a consistent pattern of abuse (as determined by the professor).

If you feel that you need an extension on an assignment or that you are unable to attend class for two or more meetings due to a medical condition (e.g., extended illness, concussion, hospitalization) or other emergency, you must contact the dean’s office and your instructors. Faculty will coordinate with the deans to determine and provide the appropriate accommodations.

Academic Integrity

Academic honesty is required in all your work. Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment or make them publicly available anywhere (e.g. public GitHub repository).

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else’s code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook: “Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion.”

The spirit of this policy applies to all course work, including code, homework solutions (e.g., proofs, analysis, written reports), and exams. Please contact me if you have any questions about what is permissible in this course.

Academic Accommodations

If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services (Parrish 113W, 123W) via e-mail at studentdisabilityservices@swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Services website.

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services.

CS65 Home Page


Last updated: Sunday, October 11, 2020