CPSC 65/LING 20:
Natural Language Processing

Schedule | Grading | Assignments and Labs | Integrity | Accomodations | Links

Introduction

This course will introduce you to a broad range of topics in the area of natural language processing including language modeling, part of speech tagging, spelling correction, morphology, syntactic parsing, semantics and machine translation. If time permits, we may also cover speech recognition, natural language generation or discourse systems.

Class information

Professor: Richard Wicentowski
Office: Science Center 251
Phone: (610) 690-5643
Office hours: Monday 1:00-3:30 pm and by appointment

Room: Science Center 181
Class Time: Tuesday, Thursday 2:40pm–3:55pm
Lab Times: Wednesday 2:40pm–3:55pm; Wednesday 1:00m–2:30pm
Text: Jurafsky and Martin, Speech and Language Processing, 2nd edition (Recommended)

Schedule

(tentative)
WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS & PROJECTS   
1 Sep 04   Introduction, Regular Expressions
• Lee, L., 2004. "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. Computer Science: Reflections on the Field, Reflections from the Field, pp. 111-118.
• J&M, Ch. 1, Ch. 2
• Mertz, D., 2003. Text Processing in Python, Ch. 3. (Regular Expression reference)
 
Sep 06   Lab 1
Sep 11   Words and N-Grams
• J&M §3.9, Ch. 4
Sep 13 Drop/Add ends (Sep 14) Lab 2
Sep 18   Part of Speech Tagging
• J&M §5.1-5.4, §5.6-§5.7
NLTK Reference on TBL
Sep 20   Lab 3
Sep 25   Unsupervised Morphological Analysis
J&M §4.10 (entropy)
J&M §11.5
Papers included in midterm project
Sep 27   Midterm Project
(Labs 4 & 5)
Oct 02   Morphological Induction
• Schone, P. and Jurafsky, D., 2000. Knowledge-Free Induction of Morphology Using Latent Semantic Analysis
(• Schone, P. and Jurafsky, D., 2001. Knowledge-Free Induction of Inflectional Morphologies)
Oct 04  
Oct 09   Sequence tagging using Hidden Markov Models
• J&M §2.2 (FSA); §3.4 (FST), §5.5 (HMM POS Tagging), §6.1-6.5 (HMMs)

Oct 11

Exam #1

Oct 16

Fall Break

Oct 18

Oct 23   Sequence tagging using Hidden Markov Models
• J&M §2.2 (FSA); §3.4 (FST), §5.5 (HMM POS Tagging), §6.1-6.5 (HMMs)
(continued) (continued) (continued)
 
Oct 25   Lab 6
Oct 30   Classes cancelled due to weather
Nov 01   Lexical Semantics
• J&M §19.1-19.3, §20.1-20.6
• McCarthy, D. et al, 2004. Finding Predominant Word Senses in Untagged Text
• Naïve Bayes: MR&S Ch. 13 up to 13.4 (skim 13.3, skip 13.4.1)
 
Nov 06    
Nov 08 Last day to declare CR/NC or withdraw with a W (Nov 09) Sentiment Classification
• Maximum Entropy: J&M §6.6-6.7
• Pang, B. et al. 2002. Thumbs up? sentiment classification using machine learning techniques.
• O'Connor, B. et al, 2010. From tweets to polls: Linking text sentiment to public opinion time series
• Riloff, E., Wiebe, J., and Wilson, T. 2003. Learning subjective nouns using extraction pattern bootstrapping.
Lab 7
Nov 13  
Nov 15   Machine Translation
• J&M Ch. 25
§ 1-2 only: Knight, K. Automating Knowledge Acquisition for Machine Translation. AI Magazine, Volume 18, No. 4, 1997.
• Knight, K., 1999. A Statistical MT Tutorial Workbook. Prepared for the 1999 JHU Summer Workshop.
Final Project (Labs 8-10)
Nov 20  

Nov 22

Thanksgiving

Nov 27   Clustering
• MR&S Ch. 16 (excluding §16.5)
Nov 29  
Dec 04   Parsing
• CFGs and Treebanks: J&M §12.1‑§12.4.2
• Top-down, Bottom-up and CKY Parsing: J&M §13.1‑§13.4
• PCFGs and Statistical Parsing: J&M §14.1‑§14.3
Dec 06  

Dec 11

Exam #2

Dec 18

Final project due

Grading

Your overall grade in the course will be determined as follows:
65%Labs and projects
30%Exams
5%Class pariticipation and Attendance

Turning in Assignments

You will submit your assignments electronically using the handin65 program. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Normally, late assignments will not be accepted; however, special exceptions can be made if you contact me well in advance of the deadline. Even if you do not fully complete an assignment, submit what you have done in order to receive partial credit.

Some assignments may take a considerable amount of time. You are strongly encouraged to begin working on assignments well before the due date.

Programming Language

Assignments will presuppose knowledge of Python. You will almost certainly end up learning some Perl and bash scripting, but you are not expected to know this (yet).

Please make sure that each program/script you turn in has:

Academic Integrity

Academic honesty is required in all work you submit to be graded. With the exception of your partner on assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work.

You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.

All code you submit must be your own with the following permissible exceptions: code distributed by me as part of the class, code found in the course text book, and code worked on with your assignment partner. You should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were. If you believe a particular 3rd party library would be useful in solving the problem at hand, please ask me to be sure it's ok to proceed.

Please see me if there are any questions about what is permissible.

Academic Accomodations

If you believe that you need accommodations for a disability, please contact Leslie Hempling in the Office of Student Disability Services (Parrish 130) or e-mail lhempli1@swarthmore.edu to set up an appointment to discuss your needs. Leslie Hempling is responsible for reviewing and approving disability-related accommodation requests. As appropriate, she will issue students with documented disabilities an Accommodation Authorization Letter. Since accommodations require early planning and are not retroactive, please contact her as soon as possible. For details about the Student Disabilities Service and the accomodations process, visit their website. You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through Leslie Hempling in the Office Of Student Disability Services.

Jurafsky and Martin, Speech and Language Processing (2/e), 2008
Manning, Raghavan and Schütze, Introduction to Information Retrieval, 2008
Manning and Schutze, Foundations of Statistical Natural Language Processing, 1999
Mertz, Text Processing in Python, 2003
NLTK: Natural Language Toolkit
The ACL Anthology
Python Documentation
Think Python: How To Think Like a Computer Scientist