Core evolution - Partition Refactoring
From Gephi:Wiki
The PartitionAPI's role is to identify attribute columns that are partitions and transform values into visual signs, color mainly. The modules have been written very quickly, the code is not well designed and could and the API should be simpler. The version 1.0 has reached his end and a stabilized second version should be designed.
Contents |
Specifications
Current and new specifications:
- Transform attribute values partition in visual signs, mainly color
- Get the list of all parts of a partition, sorted and with percentage for each
- Transformers UI sets how each partition is transformed (colored for instance)
- Show a Pie chart of the current Partition
- Group/Ungroup the current graph. Create a meta-node for each part value.
- Get partitions also from dynamic attribute columns (implemented in 0.7 beta)
- Refresh partitions automatically
- Save/Edit Color palettes (optional)
Current design
The modules are now separated in four modules: PartitionAPI, PartitionPlugin, PartitionPluginUI and DesktopPartition
- PartitionAPI: The API and its implementation
- PartitionPlugin: The transformers
- PartitionPluginUI: The transformers UI
- DesktopPartition: The TopComponent and swing controls
The current design suffers mainly on its API side. The modules have first been designed to serve the UI and the fact it could be useful to other parts of the application has only been considered later. Nevertheless it's a essential point. Filters, Data Laboratory and more would like to be able to access and get informations about partition columns.
The Transformer SPI allows to add Trandformers, it's quite modular for that.
Finding partition
Within the attribute columns, the API should be able to know if a particular column is a partition. Indeed because not all columns are interesting to be used as partition. For example the nodes' ID column has unique values for each node, or a PAGERANK column has real numbers as values.
For doing that, we need to count the number of different values for a column and compare it to the total. Currently, the column is a partition column if differentValues < 9 / 10 * nonNullValues.
That has however a drawback. It requires to get the current graph, iterates over all values, put it in a set and make the decision.
Storing partition
In order to be able to finally list elements (nodes, edges) that belong to a particular part, elements are simply stored in the part. That means for every partition we potentially have an array of all nodes or edges. That is quite inefficient. Here one can see the lack of a revert index.
API features
For building the Partition filters, we needed PartitionAPI to be able to simply calculate and return a Partition. A buildPartition(AttributeColumn column, Graph graph) method has been therefore added to the PartitionController. That is quite limited and doesn't reuse already calculated partitions.
New design
The PartitionAPI should be well serving other parts of the application about Partitions. One of the issue is to keep these informations updated, when the graph changed, columns are manipulated or the visible graph view changes. By adding the Dynamic filtering and it's Timeline component we now see how important is to refresh the Partition transformation when the filter parameter is modified. Partition is a view on data, and as a view it should be coordinated with what the graph currently visible at screen. This is not done at all in the current design,
Some features the API should have:
- Get all the different values for a particular columns, sorted by number of elements
- Get the number of elements in a particular Part
- Get the part for a particular value
- Get the percentage a particular part represent
One can see how this is actually coupled with AttributesAPI itself. For example getting the set of different values for a particular column can be useful in many modules, and would simplify some, notably those which requires min and max information. An open question is therefore if simply the AttributesAPI itself should be improved with these features. It is where the data belongs, after all.
By letting AttributesAPI manage basic partition features, that would simplify the design of Partition API. That would avoid a flow of events coming from AttributesAPI to PartitionAPI in order to let PartitionAPI track changes in partitions. It would avoid PartitionAPI to maintain its own data structure.
Refresh
Refreshing the partition means that in the UI, the user can see the list of parts, as well as their percentage change when:
- The graph is changed (means nodes are added/removed to the main graph view)
- The current visible view changed (means filters have produced a new graph view)
The creation of a refresh thread is recommended, with a refresh timer.
Status
Not started, discussions.

