Der DocBlog

Promotion in eigener Sache

GermaNet network extractor

I wrote this little tool to easily convert the GermaNet data set (xml-files) to a format that is readable by different network analysis software such as pajek, igraph, snap etc.
The tools takes the GermaNet XML-files and converts the content to either a node list („node node“), csv („node,edge,node“), a fanmod file (each node is represented by an integer: „int int“), or a pajek net-file.

./germanet.jar
GermaNetWorkExtractor (Beta)Version 1.0 for GermaNet5.2
by bastian.entrup@zmi.uni-giessen.de
Usage: GermaNetWorkExtractor [options] [GermaNet-Files] [gn_relations.xml]*
Options:
--lexical:	Extract lexical relations from GermaNet
--conceptual	Extract conceptual relations from GermaNet
--all		Extract lexical AND conceptual relations from GermaNet (default)
--nodeList	Output: list of nodes, directed (node node)
--csv		Output: comma separated values (node,edge,node)
--fanmod	Output: output for fanmod (int int)
--net		Output: pajek-net-file

If you choose to use only some, specific xml-files, this results in a process-rate <100%. Since some nodes are defined in different xml-files, i.e. xml-files not available to the tool, relations containing these are skipped!

Download: germanetEx.tar.gz

2 Comments

  1. Your tool might be very useful for lexical semantists.
    I appreciate your work.

    Would you send me the tool?

    The gz-file doesn’t contain the file named GermaNetWorkExtractor.

    Thank you in advance.

  2. bastjan

    1. August 2013 at 05:36

    Hello,

    the file „germanet.jar“ is a java file and contains the tool. Under linux run it with „java -jar germanet.jar [options] [GermaNet-Files] [gn_relations.xml]*“ (this should also work in the windows cmd-tool)

    the options are:
    –lexical: Extract lexical relations from GermaNet
    –conceptual Extract conceptual relations from GermaNet
    –all Extract lexical AND conceptual relations from GermaNet (default)
    –nodeList Output: list of nodes, directed (node node)
    –csv Output: comma separated values (node,edge,node)
    –fanmod Output: output for fanmod (int int)
    –net Output: pajek-net-file

    the fanmod mode can also be used to import the network into the c igraph-library. The R igraph library supports also the import of the node-list format (saving the file as csv and using R’s csv-import). The fanmod mode substitutes every name of every node of the network with a number (int), but it does not yet print a list of the kind „number node-name“, a function I should probably add.

    You need to have a germanet license and data set (available here: http://www.sfs.uni-tuebingen.de/lsd/). The parameter [GermaNet-Files] should contain the path to the germanet files and the [gn_relations.xml] parameter the to exactly that file. If i remember it right, you should be able to leave out the gn_relations.xml file as long as it is in the same folder as the other xml-files.

    I will give you an example:

    „java -jar germanet.jar –nodeList –all /path/to/germanet//V52_UTF/nomen*.xml /path/to/germanet//V52_UTF/gn_relations.xml > output.csv“

    to process only nouns. To process all files you can use /path/to/germanet//V52_UTF/*.xml

    Since I only have a license and data set of Version 5.2 of GermanNet, I cannot test it with different version.

    I will elaborate on the usage of the on the blog as well to make the usage easier to understand.

    If you have any further questions, don’t hesitate to contact me!

    Best regards,
    Bastian

Schreibe einen Kommentar

Your email address will not be published.

*


*

© 2017 Der DocBlog

Theme by Anders NorenUp ↑