CS65: Natural Language Processing

Schedule | Grading | Assignments and Labs | Integrity | Accomodations | Links


This course will introduce you to a broad range of topics in the area of natural language processing including language modeling, part of speech tagging, spelling correction, morphology, syntactic parsing, semantics and machine translation. If time permits, we may also cover speech recognition, natural language generation or discourse systems.

Class information

Professor: Richard Wicentowski
Office: Science Center 251
Phone: (610) 690-5643
Office hours: Tuesday 2:30-4:00 pm and by appointment

Room: Science Center 128
Class Time: Tuesday, Thursday 1:15pm–2:30pm
Lab Times: Wednesday 4:00pm–5:00pm; Thursday 2:30m–4:00pm
Text: Jurafsky and Martin, Speech and Language Processing, 2nd edition


(somewhat tentative)
1 Aug 31   * J&M, Chapters 1-2: Introduction, Regular Expressions
* Lee, L., 2004. "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001 (2up). Computer Science: Reflections on the Field, Reflections from the Field, pp. 111-118.
* (Reference) Mertz, D., 2003. Text Processing in Python, Chapter 3.
Lab 1
Sep 02  
2 Sep 07   * J&M, Chapter 4: Maximum Likelihood Estimation (MLE), N-gram models for generation and prediction, smoothing, Good-Turing, Kneser-Ney Lab 2
Sep 09 Drop/Add ends (Sep 10)
3 Sep 14   * Jurafsky and Martin, Chapter 5 sections 5.1-5.4, 5.6
* Klein, S. and Simmons, R., 1963. A computational approach to grammatical coding of English words (2up). Journal of the Association for Computational Machinery 10, pp. 334-347.
Lab 3
Sep 16   * (READ SECTIONS 2 AND 4, SKIPPING 4.4) Brill, E., 1995. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging (2up). Computational Linguistics 21:4, pp. 1-37
4 Sep 21   Sequence tagging using Hidden Markov Models
* J&M, Section 2.2 (FSA), Section 3.4 (FST), Section 5.5 (HMM POS Tagging), Chapter 6 (HMMs, up to and including 6.5), Minimum Edit Distance
* M&S, Chapter 9 (Markov Models) and Chapter 10 (Part of Speech tagging) if you want a second perspective. (errata for M&S)
Lab 4
Sep 23  
5 Sep 28   Midterm Project
Sep 30   Unsupervised Morphological Analysis
* Harris (1955, 1967); Hafer and Weiss (1974); DéJean (1998)
6 Oct 05  

Oct 07

Exam #1


Oct 12

October Holiday

Oct 14

7 Oct 19   Supervised Morphological Analysis
* Yarowsky, D. and Wicentowski, R., 2000. Minimally Supervised Morphological Analysis by Multimodal Alignment (2up)
* Schone, P. and Jurafsky, D., 2000. Knowledge-Free Induction of Morphology Using Latent Semantic Analysis (2up)
* Schone, P. and Jurafsky, D., 2001. Knowledge-Free Induction of Inflectional Morphologies (2up)
Oct 21   Lab 5
8 Oct 26  
Oct 28   Lexical Semantics/Word Sense Disambiguation
* J&M, Chapter 19.1-19.3, 20.1-20.6
Lab 6
9 Nov 02   * McCarthy, D. et al, 2004. Finding Predominant Word Senses in Untagged Text
Nov 04 Last day to declare CR/NC or withdraw with a W (Nov 05) * O'Connor, B. et al, 2010. From tweets to polls: Linking text sentiment to public opinion time series
Lab 7
10 Nov 09   * Riloff, E., Wiebe, J., and Wilson, T. 2003. Learning subjective nouns using extraction pattern bootstrapping.
* Riloff, E. and Wiebe, J. 2003. Learning extraction patterns for subjective expressions.
Nov 11   * Pang, B. et al. 2002. Thumbs up? sentiment classification using machine learning techniques.
* J&M, Sections 6.6 and 6.7 (Maximimum Entropy Models)
Lab 8
11 Nov 16   * Knight, K., 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine, Volume 18, No. 4, 1997.
(Sections 3 and 4 are optional)
Nov 18   * Knight, K., 1999. A Statistical MT Tutorial Workbook. Prepared for the 1999 JHU Summer Workshop. Lab 9
12 Nov 23  

Nov 25


13 Nov 30   Parsing
* J&M, Chapter 12 (through 12.4)
* J&M, Section 13.4.1 (CKY Parsing)
Lab 10
Dec 02  

Dec 07

Exam #2
Final project due (Dec 14)


Your overall grade in the course will be determined as follows:
50%Labs and midterm project
20%Final project
5%Class pariticipation and Attendance

Policy on Programming Assignments

You will submit your assignments electronically using the handin65 program. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Normally, late assignments will not be accepted; however, special exceptions can be made if you contact me well in advance of the deadline. Even if you do not fully complete an assignment, you may submit what you have done to receive partial credit.

Some assignments may take a considerable amount of time, so you are strongly encouraged to begin working on assignments well before the due date.

Programming Language

Assignments will presuppose knowledge of Python. You will almost certainly end up learning some basic Perl and bash scripting, but you are not expected to know this yet.

Please make sure that each program you turn in has:

Academic Integrity

Academic honesty is required in all work you submit to be graded. With the exception of your partner on assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work.

You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.

All code you submit must be your own with the following permissible exceptions: code distributed by me as part of the class, code found in the course text book, and code worked on with your assignment partner. You should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were.

Please see me if there are any questions about what is permissible.

Academic Accomodations

Academic accommodations are available for students with disabilities who are registered with Student Disability Services in the Dean's office. Students in need of disability accommodations should schedule an appointment with me early in the semester to discuss accommodations for this course that have been approved by the Dean's office. All requests must come through an accommodation letter from the Dean's office. To receive an accommodation for a course activity, your meeting with me must be at least one week prior to the activity.

Contact Tracey Rush at the Dean's office and follow these steps for obtaining accommodations.

Jurafsky and Martin, Speech and Language Processing (2/e), 2008
Manning and Schutze, Foundations of Statistical Natural Language Processing, 1999
Mertz, Text Processing in Python, 2003
NLTK: Natural Language Toolkit
The ACL Anthology
Python Documentation
How To Think Like a Computer Scientist: Learning with Python