After extracting network files from WordNet and/or GermaNet, we look at analyzing the network structure. This can be done in different ways, using different software. Since The resulting graphs of WordNet and GermaNet are possibly very large, easy to use software such as Gephi or cytoscape (which are good for working with relatively small graphs and visualizing data) is not the option of choice.

In this small tutorial we will look at the igraph package in R. In later post we will do the same in C (using igraph) and C++ (using snap).

I assume we have a csv graph file of the form „node,node“, ignoring the type of edge existing between them. First we have to load the csv file into R

`csv_file<-read.csv("[path]/wordnet.csv", header=F)`

`csv_file`

will contain the data. `header=F(ALSE)`

indicates, that there is no header in the file (i.e. the first line containing a heading for ech column).

Now we load igraph `library(igraph)`

,click here for install information. Then we can transform the csv data into a graph

`g<-graph.data.frame(csv_file,directed=T)`

`directed=T(RUE)`

indicated that the graph is directed, i.e. the first node in a line point to the second node in the line.

Now we can calculate some basic properties of our graph

//counting the nodes of the graph > vcount(g) [1] 324602 //counting the edges of the graph > ecount(g) [1] 584528 //calculating the longest path > diameter(g) [1] 27 //calculating the longes path and giving back the corresponding node IDs > farthest.nodes(g) [1] 116111 298643 27 //calculating the average path length > average.path.length(g, directed=TRUE, unconnected=TRUE) [1] 9.717564

Communities might be of interest, so we will calculate communities, using the spinglass algorithm

//sgc will hold all cumminties and the corresponding nodes! >sgc V(g)$membership max(sgc$membership) [1] 25

One might be interessted in the betweenness of the nodes. There are several betweenness algorithms we can use:

//calculates the vertex betweenness, directed = TRUE, weights = NULL, nobigint = TRUE, normalized = FALSE, >betweenness<-betweenness(g,V(g),TRUE,NULL,TRUE,FALSE) //calculates the edge betweenness, directed = TRUE, weights = NULL >eb<-edge.betweenness(g, e=E(g), TRUE, NULL) //prints the node with the highest betweenness > which.max(betweenness) [SID-08860123-N] 16252 //Cloness centrality measures how many steps is required to access every other vertex from a given vertex. //mode indicates the degree ("out", "in", "all", "total"), weights = NULL, normalized = FALSE >c<-closeness(g, V(g), mode = "all",NULL, FALSE) //eigenvector centrality //directed = FALSE, scale = TRUE, weights = NULL, options = igraph.arpack.default) >evkashubsc<-evcent(g)

For more information on these functions: betweenness, closeness, eigenvector, Kleinberg’s centrality.

What more? Well, what about the degree distribution?

//calucalating the degree distribution for all nodes in g dd<-degree.distribution(g) //plotting the degree distribution plot(dd)

For more information see: degree-distribution.

## Schreibe einen Kommentar