Last week, I landed on another PhD worthy research project.
Given a very large corpus of sentences, such as a digitized version of the Library of Congress, or a less noisy version of the Internet, how can you automatically generate a Thesaurus?
At first I thought the problem should be fairly easy, but [...]
