Specification - Data Laboratory

From Gephi:Wiki
Jump to: navigation, search

These are some of the current ideas and thoughts about Data Laboratory needs

Contents

Data Laboratory

This project implies the implementation of basic but very important and useful operations with graph data in a tabular mode that would enhance the user experience for Gephi. Less basic but yet important operations and data visualization features will be part of this project goals.

The first new data laboratory features would be centered on data manipulation aspect of data laboratory since this is its principal lack.

Non basic operations such as column merging, advanced search and dynamic attributes view will need a study of good user interface design for each specific feature in order to be useful but still not tedious or complex for the user.

The second part of this project is the creation of data visualization utilities using plots which will require creativity and investigation about different plots and using external libraries like JFreeChart (already being used in Gephi).

Expected Features

Data-lab-overview.png

Overview of data laboratory features placement.

General features

General-features.png

Add node/edge:

A dialog has to be shown to the user asking for an id and a label for the node or edge, and in case of adding an edge, source and target nodes ids.

I think that this dialog has to be simple (only ask the label for nodes and source and target for edges), assigning a automatic id, and then let the user edit that node or edge attributes using the table, once added.

Clear graph and clear edges:

These are simple but common actions.

Clear graph just deletes all elements in the graph while clear edges only removes all edges. Clear edges will allow the user to choose what types of edges to delete.

Import/Export to CSV:

Nodes and edges could be imported from CSV with attributes. When importing, it will be necessary to ensure that nodes or edges are not already on the graph and the ids are not used and provided, otherwise, new ids could be created, informing the user about it.

Also when importing edges, at least source and target nodes columns have to be provided with valid nodes ids. Type of edge and weight can be given a default value.

Exporting to CSV should be simpler than importing columns. When exporting, the user will select the columns to export. Visible rows will be exported (see search feature).

Both export and import to CSV will allow the user to choose a values separator (comma by default) and a charset.

Since importing needs some more conditions and should help the user, it is done with a wizard in 2 steps:

Import1.png

Step 1 of CSV import wizard

Import2.png

Step 2 of CSV import wizard

Idea: detecting a copy-paste on clipboard from Excel to Gephi?

Search/Replace:

The idea is to make a common search/replace dialog to enable normal and regex search of any number of columns and also normal and regex mode replacement of ocurrences.

Search-replace.png

The dialog has useful options like:

  • Normal and regex search
  • Normal and regex replace mode
  • Only match whole value
  • Make or not a case insensitive search
  • Highlight matching group of a value and select the row in the table view

Edition of attributes:

The table view has to correctly allow the edition of the attributes that can be edited. All types of data are parsed from the input string that the user writes in the cell.

Dynamic attributes view:

Graphically showing number dynamic attributes (and also lists of numbers) in a tabular view can be approached using sparklines, which are simple and easy to interpret graphics. Also, plots can graphically show how attributes change in time.

Node manipulation features (on right click)

These are actions that show on right click on one or more nodes selected in the nodes table. All of them are use cases that are a ServiceProvider of the Service of node manipulation (see class diagrams).

Node-right-click.png

Operation One node Various nodes Special conditions to be shown
Edit properties
Select on graph
Delete
Group All selected nodes must have the same parent node
Ungroup At least one selected node must be a group of nodes
Ungroup recursively At least one selected node must be a group of nodes
Remove from group At least one selected node must be in a group of nodes
Move to group At least one group has to be available for all selected node(s)
Settle The node must not be settled
Free The node must be settled
Link nodes (create edge)
Clear node(s) data
Copy node data
Select neighbour nodes on table
Select edges using the node on edges table
Copy node(s)


Edit Properties:

Shows a dialog to edit the properties(size, position, color...) of one node.

Select on graph:

This action alters the graph view in overview section showing the node on the center of the graph view.

Delete:

Deletes the selected nodes (one or more). The user will be asked to confirm the action.

Group:

Group nodes (more than one) in one node that contains the nodes with hierarchy.

All the nodes must have the same parent node or no parent (all).

Future development: grouping groups of nodes in a higher level of hierarchy

Ungroup:

Breaks the groups formed by the selected nodes (the nodes that form one). At least one node must be a group of nodes in order to show this option.

Ungroup recursively:

Breaks the groups formed by the selected nodes (the nodes that form one) recursively, breaking also the groups formed by their children nodes. At least one node must be a group of nodes in order to show this option.

Remove from group:

Removes the selected nodes from the groups they are in (the nodes that are in one) leaving them in the superior level. Also, if the last node of the group is removed, it breaks the group. At least one node must be in a group of nodes in order to show this option.

Move to group:

Shows available groups to move the selected nodes and moves them to that group.

Settle:

Blocks the position of the selected nodes.

The behaviour of the action should be: First, check that the node that is right clicked is not settled. Once executed, all nodes will be settled and nodes that are already settled will be kept settled, allowing the user to settle several nodes in one action.

Free:

Unblocks the position of the selected nodes.

The behaviour of the action should be: First, check that the node that is right clicked is settled. Once executed, all nodes will be free and nodes that are already free will be kept settled, allowing the user to free several nodes in one action.

Set node(s) size:

Sets the indicated size to all the selected nodes.

Link nodes (create edge):

Creates edges between nodes to link them. At least 2 nodes must be selected to show this option.

Once executed, the user will select the origin node. Then one or more edges will be created between this node and all of the other nodes.

Select neighbour nodes on table:

This action is executed with one node and selects its neighbour nodes in the nodes table. In order to do this selection, the Data Laboratory API for the controller will be used.

Select edges using the nodes on edges table:

Selects the edges using the node in the edges table.

Clear node(s) data:

Clears the attributes of the selected nodes except id and computed attributes.

Copy node data:

Copies the node data of the selected node except id and computed attributes to the other selected nodes.

Copy nodes:

Creates a copy of the selected nodes with the same attributes and properties.

Edge manipulation features (on right click)

These are actions that show on right click on one or more edges selected in the edges table. All of them are use cases that are a ServiceProvider of the Service of edge manipulation (see class diagrams).

Edge-right-click.png

Operation One edge Various edges
Select source node on graph
Select target node on graph
Select the nodes that the edge has on nodes table
Delete
Delete edge(s) and their nodes
Clear edge data
Edit properties

Select source node on graph:

This action alters the graph view in overview section showing the origin node of the edge on the center of the graph view.

Select target node on graph:

This action alters the graph view in overview section showing the target node of the edge on the center of the graph view.

Select the nodes that the edge has on nodes table:

Selects the nodes of the edge on nodes table.

Delete:

Deletes the selected edges (one or more). The user will be asked to confirm the action.

Delete edge(s) and their nodes:

Deletes the selected edges (one or more) and also the nodes that the edges have. The user will be asked to confirm the action.

Clear edge data:

Clears the attributes of the selected edges except id and computed attributes.

Edit Properties:

Shows a dialog to edit the properties(size,color...) of one edge.

Attribute columns manipulation features (for nodes and edges table)

These features are centered on the use of Attributes API for both nodes and edges. They should not be very complicated and some be done with few clicks (like duplicating a column).

Add column:

This is a column related use case but it is not a column manipulator because of its simplicity and the fact that it does not manipulate any existing column. Adding a column to nodes or edges needs some input from the user beyond column name. In a dialog, the user will select the type of the column (see AttributeType enumeration) and a title (that has to be validated) for it. The attribute origin will be set to DATA (see AttributeOrigin enum). Once the column is added, it will be possible to change its value for each row.

Attributecolums.png

Operation Manipulable columns
Delete column DATA or COMPUTED columns
Copy data to other column Any
Fill column with value Any DATA column or node/edge label
Clear column data Any DATA column or node/edge label
Duplicate column Any
Column values frequencies report Any
Number column statistics report Number/Number list column
Create boolean column with value regular expression matching result Any
Create column with list of regular expression matching groups Any
Negate boolean column values Boolean or boolean list column

Delete column:

Deletes any column that is not a PROPERTY column.

Copy data to other column:

Copies data from a source column to other target column. When the columns have different data types, the string representation of the values is used to to be parsed and converted to the target column type if possible.

Fill column with value:

Fills all rows of a column with a given value.

Clear column data:

Clears all values of a column. The column can't be COMPUTED or the id of a node/edge.

Duplicate column:

Makes a copy of a column of any type and origin to a new column, with the type that the user chooses. When the columns have different data types, the string representation of the values is used to to be parsed and converted to the target column type if possible.

Column values frequencies report:

Shows a HTML report that contains all the different values in a column and their absolute frequency of appearance and total percentage.

The report also includes a configurable pie chart if the count of different values does not exceed a maximum (100).

Number column statistics report:

Shows a HTML report with statistics values obtained from a numeric column.

Also it always includes various configurable charts:

  • A box-plot
  • A scatter plot that can use dots or lines and include linear regression of the data
  • An histogram with configurable number of intervals

Create boolean column with value regular expression matching result:

Evaluates the values of a column to obtain a new boolean column using a regular expression provided by the user. The values of the new column indicate if the corresponding value of the source column matches the given regular expression.

Create column with list of regular expression matching groups:

Evaluates the values of a column to obtain a new string list column using a regular expression provided by the user. The values of the new column contains all the groups of the source column corresponding value that match the given regular expression.

Negate boolean column values:

Negates all the boolean values of a boolean or boolean list column.

Merge strategies

These strategies define ways to manipulate various columns to merge them in some order into a new column.

Merge-strategies-cases.png

Merge strategy Manipulable columns
Join with separator Any
Join number columns Numeric columns
Calculate average value Numeric columns
Calculate first quartile Numeric columns
Calculate median Numeric columns
Calculate third quartile Numeric columns
Calculate interquartile range Numeric columns
Calculate sum Numeric columns
Calculate minimum Numeric columns
Calculate maximum Numeric columns
Boolean logic operations Boolean columns


Join with separator:

Joins the string representation of the values of each row with the chosen separator and creates a new string column with the result.

Join number columns:

Joins all the numbers contained in each row into a new column with the resulting lists of numbers.

Calculate average value:

Uses the numbers contained in each row to calculate the average value and put it in a new column.

Calculate first quartile:

Uses the numbers contained in each row to calculate the first quartile and put it in a new column.

Calculate median:

Uses the numbers contained in each row to calculate the median and put it in a new column.

Calculate third quartile:

Uses the numbers contained in each row to calculate the third quartile and put it in a new column.

Calculate interquartile range:

Uses the numbers contained in each row to calculate the interquartile range and put it in a new column.

Calculate sum:

Uses the numbers contained in each row to calculate the sum and put it in a new column.

Calculate minimum:

Uses the numbers contained in each row to calculate the minimum value and put it in a new column.

Calculate maximum:

Uses the numbers contained in each row to calculate the maximum value and put it in a new column.

Boolean logic operations:

Uses the boolean values of each row of the given columns in the specified order to apply the boolean operations that the user chooses between each boolean value to get a row result to put in a new column.

Attribute values manipulators

These manipulators interact with single attribute values (cells in a table).

They appear in right click context menu.

Attribute-values.png

Manipulator Manipulable attribute values
Clear value Any
Number list statistics report Number lists or dynamic numbers

Clear value:

Clears the data contained in the value.

Number liist statistics report:

Shows a HTML report with statistics values obtained from a number list or dynamic number.

Also it always includes various configurable charts:

  • A box-plot
  • A scatter plot that can use dots or lines and include linear regression of the data
  • An histogram with configurable number of intervals

Classes and interfaces

Nodes and edges manipulators Services interfaces

The following interfaces allow to create ServiceProviders for manipulation of nodes and edges.

They are contained in the package org.gephi.datalaboratory.spi of the module Data Laboratory API.

Manipulators.png

An interface (Manipulator) with all common needs of the manipulation Services is defined, and then nodes and edges manipulators are specified.

These common needs are:

  • Provide an execution method: void execute()
  • Provide the name of the operation to show with the current data and conditions: String getName()
  • Provide a description of the operation: String getDescription()
  • Indicate if the operation can be executed with the current data and conditions (for example it can count the number of nodes or check if they are grouped): boolean canExecute()
  • Optionally provide an UI, returning a ManipulatorUI instance or null: ManipulatorUI getUI()
  • Declare the type of the operation. Used for separating them in groups, the lesser groups will appear first: int getType()
  • Declare the appearing position of this operation in its group in the options menu returning an integer. Less means upper: int getPosition()
  • Optionally, indicate the icon to show with the operation: Icon getIcon()

Then, NodesManipulator and EdgesManipulator provide a specific method (setup) to configure its data (nodes and edges). This method will be always called before any other methods of the previous general interface.

User Interface (UI) design

In this section are described the different options for the UI and the chosen solutions.

Choosing the way each type of manipulation Service is shown in the UI

Nodes and edges manipulation

In this case, interacting with nodes and edges, the solution is typical and simple. Because the nodes/edges are shown as rows in 2 tables and they allow multiple row selection even with not contiguous rows, the different actions and operations with the number of selected rows appear in a popup on right click.

This way we can separate and sort the actions, enable or disable, show different names for them, tooltips or even icons depending on the selection and the right clicked row.

No more options were taken into account for this aspect.

Node-edge-ui.png

Attribute columns

Finding an adecuate solution for the UI of actions that manipulate a column was not as simple as nodes/edges.

Several options were taken into account and discussed in the irc gephi channel before choosing the solution:

  • Enable column selection in table and interact with them on right click - This was discarded because could be confusing and not comfortable for the user, because it would be needed to change between row selection and column selection modes.
  • Show 1 list of the columns and 1 list of the manipulators with only 1 execution button – Similar to the chosen option but it was discarded because it would not use the available space of the UI and it would not be possible to know all the existing actions with a quick look.
  • Add a different button for each action that shows a list of the available columns for that manipulator when clicked. And when one is clicked, the action is executed on that column – This was the chosen solution. Also it allows to show a different list of columns for each button depending on the manipulator conditions. This type of buttons are called “Drop down buttons” and the used implementation of them is the JCommandButton class of the open source library Flamingo.

Attribute-column-ui.png

General actions in data laboratory

Since these actions don't need any data to be set up before executing, the simple solution is to put a button for each action and again allow to specify name, tooltip, icon, UI (more necessary in this type of manipulators)...

General-actions-ui.png

Merge strategies

These special type of manipulators that define strategies of merging attribute columns appear in columns merge dialog. The user can select the columns to merge and the order to do it. And with this information, the strategy can indicate if it can be executed with those conditions.

Merge-strategies.png

Other information

Model view controller pattern and facade pattern will be followed to separate business code from GUI code.

Different controllers grouping each differentiable type of feature will be created and which will need different features handling classes and possibly some model classes to be manipulated by the controller handling classes.

Architecture structure first idea

DataLaboratory.png