Import CSV Data

From Gephi:Wiki
Jump to: navigation, search

Introduction

This manual shows what steps are necessary to import CSV files into Gephi with data laboratory, which will expect that each row of the file is a node or an edge.

Note that the import can be done at any moment, the workspace does not need to be empty. You should know some general aspects of the import process:

  • Only the columns that you want will be used, except for mandatory columns (source and target of edges) that can't be unchosen
  • If a column title already exists in the workspace you will be able to use it but the data type of the column can't be changed, and imported data will be parsed to fit the existing column type
  • Data for each row and column will be parsed to the given/existing column data type only when it is possible, if it is not possible for a cell it will be set a null value
  • You can choose what table to import the rows to (nodes or edges), but the behaviour and needs are a bit different and you should make sure what options are selected before executing the import process

Also, we are going to learn how to adapt the import process to our needs choosing the options that the import wizard allows to indicate.

Launching the import CSV wizard and first step

We start by clicking 'Import CSV' button in Data Laboratory as seen in the picture

Launch-import-csv-wizard.png

Then a wizard will open and you will have to choose some generic, table independent options.

Import-csv-step1.png

In this step you can choose options mainly for indicating the format of the CSV file while seeing a preview table with the result.

  • First you have to specify a CSV file that is not empty
  • Then you can choose the values separator of the file from some common options and the encoding/charset of the file
  • Finally you have to choose the table to import the rows and columns of the file. If you choose edges table you will be required to provide at least a 'Source' and 'Target' column (case insensitive)

Last step - Choosing columns details and table specific options

In this second and last step you will see some common options and behaviour description whether you are importing to nodes table or edges table:

  • You can mark what columns to use and select their type when they don't exist yet
  • If no node/edge Id column is provided or it is empty for some row, an automatic id will be assigned

Importing to nodes table

When importing to nodes table you can also indicate if you want to update a node's data instead of creating a new node when a node with that id already exists.

Import-csv-step2-nodes.png

Importing to edges table

In case of importing to edges table, the behaviour is a bit different:

  • Source and Target nodes ids is mandatory for all rows
  • Edge type is optional and default is 'Directed'
  • Edges data can't be updated
  • An edge will be ignored if it already exists even with a different id

Import-csv-step2-edges.png