Graph of the English Language

Dictionaries are really great tools, but they can only go so far. The really good ones (The OED) will give the user a really good ‘feel’ for the word, a sense of the connotations that go beyond the straightforward definition. A thesaurus can also be really useful, especially when you’re searching for a specific word, but can only remember associated words. But both of these tools lack visualization. Here is where technology can help out and create a more immersive, exploratory environment for our words. The thesaurus, through it’s simple listing of words related to other words, has some really interesting features, specifically that of directionality. Sometimes you find wordA in the listing for wordB but not vice-versa. Using these listings, we could build one giant directed graph of the English language. But what would we expect to find?

  1. The graph is probably not acyclic.
  2. High probability that it’s not planar, and therefore will be hard to draw.
  3. There will be clusterings of words (probably short words) around certain concepts
  4. These clusterings will probably center around descriptive features of our world (such as the mythical Eskimo words for snow)
  5. It will reveal interesting conceptual connections between words, (sounds, cheese and knifes can all be ‘sharp’)
  6. Those connections probably relate to our internal models of the world (synethesia).
  7. Emphasis of these connections probably varies by culture (but anyone familiar with idioms from more than one languages already knows this)
  8. There will be gaps and holes in the language, that will show up as empty areas between conceptual clusterings.

I’m primarily interested in seeing how this type of visualization can help us to understand the tangled relationship between language and cognition. It’s been a combination of laziness and that 2nd item that’s prevented me from writing this kind of software (which would be a great exercise to try out graph visualization techniques). But then again, others have forseen my vision (as usual).