Contact UsComputer Science Department
500 College Avenue
Swarthmore, PA 19081
Email: info at cs.swarthmore.edu
Copyright 2009 Swarthmore College. All rights reserved.
Maria Kelly and Rachel Lee compete in SemEval-2010
This semester Swarthmore students Maria Kelly and Rachel Lee, working with Professor Richard Wicentowski, competed in SemEval-2010.
SemEval-2010 is the fifth semantic evaluation workshop, and the third that Swarthmore College has participated in. The workshops, held every three years during the annual meeting of the Association for Computational Linguistics, offer researchers the ability to design solutions to current problems in computational semantics, and to have their solutions compared against the solutions of other researchers from around the world.
As part of SemEval-2010, Maria, Rachel, and Professor Wicentowski participated in SemEval-2010 Task 2: Cross-Lingual Lexical Substitution. In this task, participants were given a sentence in English and a target word in that sentence, and asked to provide the best possible translation (or translations) of that target word into Spanish.
For example, suppose the target word is "rough" in the two sentences below:
- A very rough way of gauging your so-called preferred weight is to
use the Body Mass Index.
- It was a very poor area and Stanway Street was very overcrowded and rough.
In the first sentence, the Spanish word impreciso ("imprecise") would be one of the most appropriate translations, but in the second sentence, peligroso ("dangerous") would be more appropriate. Neither translation would work well in the other sentence.
This task is one of many being studied by computer scientists who are interested in computational linguists. If this task could be easily done automatically by a computer program, it would be useful in many applications, such as improving the quality of automatic translation software, assisting human translators, or even helping people trying to learn a new language.
Maria, Rachel and Professor Wicentowski implemented two different solutions. Their systems, implemented in Python, did very well in the competition. There were two different metrics used to score the participating teams. Using the first method, their programs scored 5th and 10th (out of 14). However, using the second scoring method, they placed 1st and 2nd, and did so by a wide margin.