Google Summer Of Code 2010

From Gephi:Wiki
Jump to: navigation, search
Google Summer of Code 2010 Logo

This document compiles official Gephi Google Summer of Code 2010 proposals. One can propose other ideas by going on the forum and start the discussion.

And also see Introduction Post

The Google Summer of Code 2010 is now closed.

Shader Engine

We propose to develop a network visualization engine using GPU Shaders technology to improve performances and scalability.

 Difficulty: Hard
 Required skills: Java, OpenGL, GLSL
 Assigned Mentor: Mathieu Bastian

The aim is to create a basic 3D engine using suggested technologies and use it for drawing as many nodes and edges as possible. Moreover the engine should guarantee the possibility to add an interaction layer to develop the network exploration.

Existing Gephi 3D engine is intended to be used with as much hardware as possible and thus doesn’t profit from GPU so much. The particularity of a network visualization 3D engine is the need to draw a huge amount of very basic shapes (sphere or cube for nodes and lines for edges). For a traditional "direct" engine, drawing millions of objects, even if it is the same object costs millions of loop iterations and CPU-to-GPU transfers. When using shaders, graphics are directly processed by the GPU and therefore speed-up rendering.

We suggest this draft roadmap:

  • Get started with loading and executing vertex and fragment shaders within a JOGL routine in a sample application.
  • Code impostors for nodes, work on sphere appearance with ambient occlusion technique. The molecular visualization can be a great source of inspiration.
  • Code basic geometry and camera positioning to allow displacement and zoom.
  • Build the architecture to manage node updates (position update, select, color, size).
  • Try to draw edges with GPU.
  • Propose nice post-processing effects. Try to make a glow effect around spheres.


Related discussion: forum.

Dynamic attributes and statistics

The aim of this proposal is to go further in dynamic network analysis (DNA) in Gephi.

 Difficulty: Medium
 Required skills: Java
 Assigned Mentor: Julian Bilcke

The current 0.7 version can import dynamic graphs written in the GEXF syntax, then filter them using the Timeline component. However, it only filters the graph topology (hide nodes or edges).

The next step is thus to handle dynamic changes of graph attributes (eg. user-defined columns, and visualization-related attributes, like color). Finally, we need a framework to build and query a dynamic graph from a proper API. This API could be used by other modules to facilitates algorithms developments and integration with external sources.

Current Gephi Statistics package computes a set of metrics for a given graph, like Clustering Coefficient or Degree Distribution. Computing these metrics for each time interval in the time series has a great interest to analyze graphs within time. The Dynamic API could be queried to get the graph for a given time interval and therefore speed-up these algorithms development.

We suggest the following tasks:

  • Code a data structure to host dynamic attributes efficiently. Look at interval/segment structures, like Range Trees.
  • Create a Dynamic API which has the following features:
    • Return a Dynamic Graph Decorator, that wraps the graph and a time interval
    • Return static graph copy for given time interval
    • Return attributes values array for a given node and time interval
  • Adapt Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.
  • Propose idea for dynamic visualization of attributes (color, size, label color, label size, edge weight).
  • If time remains, adapt the Ranking module to support dynamic attributes -> dynamic visualization attributes transformation.


Related discussion: forum.

Web-based network visualization engine with WebGL

This proposal offers to start a new project, by developing an efficient network visualization library for the web using WebGL.

 Difficulty: Medium
 Required skills: Javascript, OpenGL
 Assigned Mentor: Mathieu Bastian

Network visualization on the web is an active topic, one can find many graph libraries examples developed in Flash/Flex, Javascript or Processing that can visualize networks and offer interaction. These libraries often offer nifty visualization but poor performances. Visualizing thousands of nodes and edges somewhere else than on the desktop remains a challenge. OpenGL brought to the browser, and WebGL in particular is a major advance for that topic.

We suggest the following roadmap:

  • Get started with WebGL and development environment
  • Draw a graph in a 3d space. Draw nodes as sphere, edges and labels with WebGL.
  • Add interaction layer with displacement and mouse movement.
  • Create API to define a graph structure and all visual attributes that can be touched.
  • Document and publish the API in a library. Write a getting started.
  • Use high-level optimization like Octree structure to optimize rendering.

NodeXL team will be invited to participate to specification. O3D API alternative may also be considered, as WebGL performance needs to be measured.

An important aspect of this project is to deliver a final version with little features but exposed by a proper documented API.

This project would let the student get experience in API design, WebGL, high-level optimization and data visualization. Good communication skills are required to promote the project, write documentation and blog posts.


Related discussion: forum.

Force-Directed Edge Bundling

This proposal offers to implement the Edge bundling algorithm in Gephi.

 Difficulty: Easy
 Required skills: Processing
 Assigned Mentor: Mathieu Bastian

Edge bundling is based on the principle of visually bundling adjacency edges together, analogous to the way electrical wires and network cables are merged into bundles along their joint paths, and then fanned out again at the end, in order to make an otherwise tangled web of wires and cables more manageable. When applied to the field of data visualization, this technique can be used in conjunction with several existing data mapping techniques to significantly reduce visual clutter.

Danny Holten and Prof. van Wijk have recently succeeded to merge the concept of edge bundling with force-directed network graphs, also known as node-link graphs, in their work "Force-Directed Edge Bundling for Graph Visualization". Here, edges have been modeled as flexible springs that are able to attract each other. The resulting network visualizations show significantly less clutter while making high-level edge patterns more visible.

The Vectorial Preview module in Gephi is a processing-based graph rendering component. It's aim is to allow the last graphical touch before exporting the network as SVG or PDF. It is separated from the OpenGL visualization engine, which is built for performance. The algorithm will therefore be implemented in this module.

In addition of data visualization skills, the student has the opportunity to continue the work on the Preview module and participate to it's next refactoring with his mentor. The aim is to improve modularity of this module.


Related discussion: forum.

Data Laboratory

This proposal focuses on data manipulation and transformation within the data table component.

 Difficulty: Easy
 Required skills: Java, Swing
 Assigned Mentor: Sebastien Heymann

Gephi's aim is to cover the complete process of a network analysis task. Because network data are often dirty, malformed or just incomplete, one need to manipulate these data to fix issues or refine it. A data table tool is an appropriate method to let users search, manipulate and transform network data within Gephi. The current "Data Laboratory" module is a proof of concept and has limited features. However it is a good starting point for this proposal.

Features developed in the data laboratory should be accessible through a well-defined API.

The Data Laboratory should have the following features, in addition of the data table view:

  • Attributes manipulation:
    • Add/Remove/Clean/Duplicate column
    • Merge columns, with configurable merge strategy.
    • Split text with separators in columns
    • Formula-driven operations (sqrt, min, max, avg, bound).
    • Create column from a regex on other column
  • Import/Export columns from/to CSV
  • Search on one or all columns. Regex based search. Autocomplete if possible.
  • Symmetrizing (directed -> undirected)
  • Dynamic attributes view

Additional visualization could be added to enhance user experience:

  • Scatter Plot
  • Dynamical attributes view with sparklines
  • Parallel Coordinates dealing with any number of dimensions
  • Look what infovis can offer

The existing module will help to get started with the technologies and the environment. The APIs this module will depend on are well-defined and documented. Moreover mentorship will be offered to use them in the best way. In addition to software design, the student would gain experience in Human Computer Interaction (HCI) and UI design.


Related discussion: forum.

Graph Streaming API

This proposal's target is to build a unified framework for graph streaming and apply it with a Web test case or a socket server.

 Difficulty: Medium
 Required skills: Java
 Assigned Mentor: Mathieu Bastian

Gephi's data structure and visualization engine has been built with the idea that a graph is not static and might change continuously. By connecting Gephi with external data-sources, we leverage its power to visualize and monitor complex systems or enterprise data in real-time. The way these data-sources push graph data to Gephi needs to be controlled and monitored. The Graph Streaming API will be responsible for this job and will have the following features:

  • Robust incremental graph construction. Work to properly manage updates.
  • Can define rules to accept or reject graph elements.
  • Thread-safe graph container.
  • Source traceability. For each element, keep trace of who added it first and who touched the element.
  • Dynamic graphs support. Current timestamps can be set to arriving elements.
  • On-demand container, new graph elements can be queued and wait for the user's flush order.
  • Basic user interface to monitor the container.

In order to test this API, the proposal suggest to either develop a multi-threaded socket server or to connect Gephi to a Web API like Twitter. The socket server will be a key feature for Gephi interoperability. Any other system could push graph data to Gephi through the network. Java provides API for socket connection and a lot of documentation could be found to build a reliable socket server. On the Web API side, the idea is to connect Gephi to a Web API that can return graph data periodically. Twitter of course can be investigated but other trails could be followed.

Gephi currently imports data through ImportAPI, which already separates importing from processing data to the main data structure by using a container: nodes and edges are fully loaded into a container, then this container is checked for errors and finally appended to the main data structure. This API could be extended and/or reused for this proposal.


Related discussion: forum.

Direct Social Networks Import

Twitter, NYT, Flickr, Facebook,, emails... are considered by sociologists (and others) as valuable data sources. This proposal aims to make Gephi's import process easier by connecting it to the proper API, and making available direct queries on these social networks.

 Difficulty: Easy
 Required skills: Java, XML
 Assigned Mentor: Sebastien Heymann

Social Networks insights with Gephi must currently follow these steps:

  1. Get data like a "screenshot" or a photography.
  2. Pre-process data.
  3. Generate graph file (e.g. GEXF or GraphML).
  4. Import this file in Gephi.
  5. Start visualizing and analyzing data.

We could be more flexible by making Gephi able to connect to their API directly and grab data centered on individuals or complete queries. The process would be transformed to:

  1. Select a network to connect.
  2. Write the query.
  3. Gephi gets the data and generates the network on screen.
  4. The user expands and refines the results by interacting with the network (graph window). Data Lab interactions will be created in the future.

The code will be implemented as a Gephi plugin. It will use or extend ImportAPI, which separates importing from processing data to the main data structure by using a container: nodes and edges are fully loaded into a container, then this container is checked for errors and finally appended to the main data structure.

We are interested by the following data-sources, prioritized:

  • Emails
  • Twitter
  • New York Times
  • Flickr
  • Youtube
  • Facebook (optional)
  • (optional)

If time remains, the student could add new tools to Gephi Toolbar and propose interactive data source exploration. For instance by cliking on a node, the Twitter tool would expand the graph by adding followers. Another example with a tool that would expand a NYT article with related keywords.


In Facebook, it is more interesting to map who replies on someone's and with which frequency, than just to get the friends list. And it could evolve according to the cutting-edge related researches, or on how people use these networks. So it's on the developer's responsibility to provide a configurable import, and to let the user process both simple and complex queries with an easy-to-use GUI. On each network, he must identify which data could take the role of node, what associated data attach to them, and which one could define an edge.

While getting data from Flickr, for example, one could:

   * Set people and photos as nodes, and set comments as edges. Associated node data: # of friends for people, license and all tags for photos. Associated edge data: comment's full text.
   * Set people as nodes, and identical tags on photos as edges. Associated node data: # of photos for people.
   * Set set tags as nodes, and photos as people as edges. Associated node data: nothing. Associated edge data: license for photos...

An important feature is to get the timestamps when it's available: a dynamic network would then be created.


Related discussion: forum.

Adding support for Neo4j in Gephi

The project aims at enabling Gephi to visualize graphs stored using the Neo4j graph database.

 Difficulty: Medium
 Required skills: Java
 Assigned Mentor: Tobias Ivarsson, Mathieu Bastian

A graph model coupling based on Neo4j could be implemented as a plugin for Gephi. This would enable users to work with a Neo4j graph from within Gephi. This is useful for debugging data of an application, generating graphical reports, and domain data design. From the other side of things it would also enable Gephi to work with arbitrarily large graphs, much larger than what can fit into RAM.

Gephi stores its graph in RAM with the help of GraphAPI and AttributesAPI. All modules that read or manipulate the graph or the attributes interact with these APIs. The Neo4j plugin would be responsible to control these APIs to map Neo4j data and perform operations.

Goals of this project

  • Enable application developers using Neo4j to use Gephi for data debugging and introspection.
  • Enable Gephi to work with larger graphs than can be fitted into memory.

Suggested roadmap

  • Get started with Neo4j, write some code using it to get a feel for it.
  • Write a new Gephi module on top of the Neo4j graphdb API and commanding Graph and Attributes APIs
    • Map the Neo4j data model to Gephi GraphModel: nodes, relationships, relationship types and node/relationship properties.
    • Map Neo4j data retrieval to Gephi GraphView: traversers - filter on relationship types and relationship directions, set depth etc.
    • Work with unit tests to verify progress.


Related discussion: forum.

Related resources