Datasets

From Gephi:Wiki
Jump to: navigation, search

Gephi sample datasets, in various format (GEXF, GDF, GML, NET, GraphML, DL, DOT). Feel free to add new datasets. Be sure you cite original authors.

Supported graph formats are described here.

Note that Gephi can open these files without the need to be unzipped.

Web and Internet

[GEXF] EuroSiS web mapping study: Mapping interactions between Science in Society actors on the Web of 12 European countries. Original report and data can be found here.

[GML] Internet: a symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the University of Oregon Route Views Project. This snapshot was created by Mark Newman from data for July 22, 2006 and is not previously published.

Social networks

[GML] Les Miserables: coappearance weighted network of characters in the novel Les Miserables. D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).

[GEXF] Hypertext 2009 dynamic contact network: contact network during the Hypertext 2009 conference. Source: Sociopatterns.org

[GML] Zachary's karate club: social network of friendships between 34 members of a karate club at a US university in the 1970s. W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).

[GML] Coauthorships in network science: coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006. A figure depicting the largest component of this network can be found here. M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).

[GEXF] CPAN authors: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known to be organized as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of developers, linked when they use the same Perl module. Orginal data can be found here.

[GEXF] CPAN distributions: CPAN Explorer is a visualization project aiming at analyzing the relationships between the developers and the packages of the Perl language, known to be organized as the CPAN community. This snapshot was created by Linkfluence in July 2009. This file contains the network of Perl modules dependencies. Orginal data can be found here.

[NET] Jazz musicians network: List of edges of the network of Jazz musicians. P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003).

[TGZ] Github open source developers. See http://lumberjaph.net/blog/index.php/2010/03/25/github-explorer/

[DL] Online Social Network 1899 nodes - Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163

[GEPHI] The Marvel Social Network Networks of super heroes, constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Collected by Infochimps and transformed & enhanced by Kai Chang.

[GDF] Comic and Hero Network Data Same as above, but with the comics the hero appear.

[DOT] Twitter mentions and retweets of some part of the Twitter network. The file is updated from time to time.

[GEXF] Contact networks in a primary school, SocioPatterns team, 2011

Biological networks

[GEXF] Diseasome: A network of disorders and disease genes linked by known disorder–gene associations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. The original dataset can be found here: The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007), Proc Natl Acad Sci USA 104:8685-8690

[GEXF] C. Elegans neural network: A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986).

[GEXF] Yeast: Protein-Protein interaction network in yeast. Original data can be found here.

Original data can be found [here.

Infrastructure networks

[GML] Power grid: An undirected, unweighted network representing the topology of the Western States Power Grid of the United States. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).

[GRAPHML] Airlines: unknown source.

[NET] US Air97: North American Transportation Atlas Data (NORTAD). Original data can be found here.


Other networks

[GEXF] Java code: Source code structure of a Java program, by S.Heymann & J.Palmier, 2008.

[GEXF] Dynamic Java code: Dynamic source code structure of a Java program by evolution of commits on the SVN, by S.Heymann & J.Bilcke, 2008.

[GML] Word adjacencies: adjacency network of common adjectives and nouns in the novel David Copperfield by Charles Dickens. Please cite M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).

[NET] Wordnet English dictionnary: unknown source.

[DOT] Abstract mesh : 331 nodes

Sources

Some of the above datasets are from:

Other network data repositories