CPSC 65/LING 20:
Natural Language Processing

Schedule | Grading | Assignments and Labs | Integrity | Accomodations | Links


This course will introduce you to a broad range of topics in the area of natural language processing including language modeling, part of speech tagging, spelling correction, morphology, syntactic parsing, semantics and machine translation. If time permits, we may also cover speech recognition, natural language generation or discourse systems.

Course Goals

By the end of the course you will:

Class information

Professor: Richard Wicentowski
Office: Science Center 290 (in the Chemistry hallway)
Phone: (610) 690-5643
Office hours: Tuesday 1:00-4:00 pm; Monday, Wednesday and Friday by appointment

Room: Science Center 199
Class Time: Tuesday, Thursday 9:55am–11:10am

Lab Room: Science Center 256
Lab Times: Thursday 1:05pm–2:35pm; Thursday 2:45pm–4:15pm


(Tentative) Schedule


Sep 02

  Introduction, Basic Probability, Regular Expressions
Choose one of the next two:
• M&S Ch.2 (p 40-48 probability background), Ch.3 (linguistics background: skim as needed)
• J&M Ch.1 (history of NLP) and Section 2.1 (regular expressions; see also Mertz below)
Both of these:
• BK&L Ch.1 (intro to NLTK; Python refresher)
• Mertz, D., 2003. Text Processing in Python, Ch.3 (reference for regular expressions)
Lab 1

Sep 04


Sep 09

  Words and N-Grams
Choose one of the next two:
• J&M §3.9, Ch.4
• M&S Ch.6
Lab 2

Sep 11

Drop/Add ends (Sep 12)


Sep 16

  Part of Speech Tagging
Pick on the following two:
• J&M §5.1-5.4, §5.6-§5.7
• M&S §10.0-10.1, §10.2.1, §10.4
Each of these:
NLTK chapter 5 (especially §5.3-§8)
NLTK Reference on tag.brill
• Brants, T. TnT: A Statistical Part-of-Speech Tagger (skip §2.5)
Lab 3

Sep 18


Sep 23

  Morphological Analysis
• J&M § 4.10 (entropy)
• J&M § 11.5
• Harris (1955, 1967), Hafer and Weiss (1974)
Labs 4/5

Sep 25


Sep 30


Oct 02

  Sequence tagging using Hidden Markov Models
Pick one of the following two:
• J&M §2.2 (FSA); §3.4 (FST), §5.5 (HMM POS Tagging), §6.1-6.5 (HMMs)
• M&S Chapter 9 (Note that the variable names used for some parts of the description are different; e.g. M&S uses X for the states; J&M uses Q like we did in class. Other than variable name differences, the material is the same.)
The ice cream HMM in Excel:
Jason Eisner's Excel spreadsheet demonstrating the forward-backward training algorithm.

Oct 07


Oct 09


Oct 14

Fall Break

Oct 16


Oct 21

First exam

Oct 23

  Lexical Semantics Lexical Semantics
• J&M §19.1-19.3, §20.1-20.6
• Naïve Bayes: MR&S Ch. 13 up to 13.4 (skim 13.3, skip 13.4.1)
Lab 6

Oct 28

  Guest lecture: Gideon Mann, Head of Data Science, Bloomberg LP

Oct 30

  Lexical Semantics (continued)
• McCarthy, D. et al, 2004. Finding Predominant Word Senses in Untagged Text
Lab 7

Nov 04

Office hours Wednesday 2:30-5:30 this week only (Nov 05)

Sentiment Classification
• Pang, B. et al. 2002. Thumbs up? sentiment classification using machine learning techniques.
• O'Connor, B. et al, 2010. From tweets to polls: Linking text sentiment to public opinion time series
• Riloff, E., Wiebe, J., and Wilson, T. 2003. Learning subjective nouns using extraction pattern bootstrapping.

Nov 06


Nov 11

• CFGs and Treebanks: J&M §12.1& • Top-down, Bottom-up and CKY Parsing: J&M §13.1& • PCFGs and Statistical Parsing: J&M §14.1&

Nov 13

  Lab 8

Nov 18

• MR&S Chapter 16: §16.3-§16.4
• MR&S Chapter 17: §17.1-§17.4, §17.7
• MR&S Chapter 14: §14.3

Nov 20

  Final Project

Nov 25

  Machine Translation
• J&M Ch. 25
§ 1-2 only: Knight, K. Automating Knowledge Acquisition for Machine Translation. AI Magazine, Volume 18, No. 4, 1997.
• Knight, K., 1999. A Statistical MT Tutorial Workbook. Prepared for the 1999 JHU Summer Workshop.

Nov 27



Dec 02

  Machine Translation (continued)

Dec 04


Dec 08

Second exam on Monday 12/8 from 7-9pm in SC 199

Dec 09

  Speech recognition and generation
Donuts and our very own Adam Lammert!

Dec 18

Final project due at NOON


Your overall grade in the course will be determined as follows:
40%Labs and projects
5%Class pariticipation and Attendance

Turning in Assignments

You will submit your assignments electronically using the handin65 program. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Normally, late assignments will not be accepted; however, special exceptions can be made if you contact me well in advance of the deadline. Even if you do not fully complete an assignment, submit what you have done in order to receive partial credit.

Some assignments may take a considerable amount of time. You are strongly encouraged to begin working on assignments well before the due date.

Programming Language

Assignments will presuppose knowledge of Python. You will almost certainly end up learning some Perl and bash scripting, but you are not expected to know this.

Please make sure that each program/script you turn in has:

Academic Integrity

Academic honesty is required in all work you submit to be graded. With the exception of your partner on assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work.

You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.

All code you submit must be your own with the following permissible exceptions: code distributed by me as part of the class, code found in the course text book, and code worked on with your assignment partner. You should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were. If you believe a particular 3rd party library would be useful in solving the problem at hand, please ask me to be sure it's ok to proceed.

Please see me if there are any questions about what is permissible.

Academic Accomodations

If you believe that you need accommodations for a disability, please contact Leslie Hempling in the Office of Student Disability Services (Parrish 113) or email lhempli1@swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, she will issue students with documented disabilities a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact her as soon as possible. For details about the accommodations process, visit the Student Disability Service Website. You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through the Office of Student Disability Services.

Manning, Raghavan and Schütze, Introduction to Information Retrieval, 2008
Mertz, Text Processing in Python, 2003
The ACL Anthology
Python Documentation
Think Python: How To Think Like a Computer Scientist