2nd midterm exam topics
- Morphology
- Goldsmith (2001)
- Signatures
- MDL
- Segmentation only (no analysis) / suffix only
- Schone and Jurafsky (2000) and (2001)
- Generate PPMVs
- Use wide context for semantic similarty
- Use narrow context for syntactic similarity
- Transitive links
- Segmentation only (no analysis) / suffix, prefix and circumfix
- Yarowsky and Wicentowski (2000)
- "Wide" context for semantic similarity
- Levenshtein for orthographic similarity
- Frequency for distributional similarity
- EM to learn parameters
- Segmentation / regular suffixes / irregular forms
- Lexical Semantics (Ch 19, Ch 20)
- Chapter 19
- polysemy, hypernym/hyponym, wordnet
- Chapter 20
- supervised word sense disambiguation
- vector space model
- naive bayes classifier
- baseline and upper-limit performance
- Lesk: choose the sense sharing the most number of words
with the dictionary defintion
- Minimally supervised WSD (bootstrapping) - Yarowsky
- McCarthy et al (2004)
- Find predominant sense
- Predominant sense is a good baseline and fallback
- Can find predominant sense for individual corpora
- Purandare and Pedersen (2004) with movie and slides
- Link above is slow; links below are local and should provide faster downloads.
- First and second-order context vectors (part 1 [1.3 GB], 57:30-1:15:00)
- Latent Semantic Analysis, Singular Value Decomposition (part 2 [700M], 2:30-16:00)
- Clustering (part 2 [700M], 21:00-31:00)
- Machine Translation
- Literature Reviews
- Familiar with at least one paper that is not your own
include("style/footer.php"); ?>