(German) Language Processing for Lucene

This paper introduces an open-source Java-package called German Language Processing for Lucene (glp4lucene). Although it was originally developed to work with German texts, it is to a large degree language independent.

Language or Hoax: Some statistical features of the Voynich manuscript

It has been claimed that the Voynich manuscript is a encrypted text, and encrypted (natural?) human language. Other take it for a fraud, an educated hoax.

Recently I have been wondering about the statistical features of the Voynich manuscript, and about statistical features of human language texts that distinguish them from gibberish. In the following I will look at the word distribution and the entropy of the Voynich manuscript compared to Latin, modern German, and English.

Continue reading

connexor xml output to bracket tree structure and image

If you ever worked with connexor, you know how it encodes the dependency structure in xml. It's a little tricky not to loose one self in it with all it's links to other entities and words.

Bracket tree structure to image converter using graphviz

I testet some syntax and dependencie parsers the other day and was looking for an easy way to produce humanreadable output, i.e. graphics. I found this very interesting link to a software producing graphviz dot-files (your need to have graphviz installed!) from Stanford Parser trees. I worked only a little on the program to make it work on other parsers  such as the connexor parser.

An other piece of the puzzle: The Language Instinct Debate.

The little difference in our genes compared to primates, especially apes, seem to make the big difference. You could ask, why can humans talk while hominids can’t (well, some can say a few words or sign, but theyare not much of a storyteller).

This month the article Dennis et al., Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication, Cell (2012), doi:10.1016/j.cell.2012.03.033 was published. It states, that during our evolution one gene was (incompetely) duplicated. Actually, this happend twice. Once around the time when the genus homo and the hominids split and then again when homo and Australopithecus split.

Experiments with mice showed that the mutated and duplicated genes eventually lead to a different, more human-like brain structure. It implies, that these duplicated genes slow down the development of special parts of the brain and there by enable the part of the brain that is responsible not only for our ability to talk to to develop further. It‘ seems that one of the reasons that humankind was able to produce language, lays in this mutation of the SRGAP2 genes.

