In part 2 here, we’ll explain how you can get setup Ne04j and Talend. We’ll get you started to load neo4j demo data.

Our first step is to install our favourite graph database Neo4j. Neo4j offers a free community edition and although it lacks enterprise features such as load balancing, this version is still a perfect fit for our demo.

Go to the download page and get Neo4j installed!

Talend also has a free version of its integration software named Talend Open Studio.

Go to the download page and install this package.

Talend Job

When you want to start Talend, a project must be chosen. Create a new project and a new workspace is generated. Next, we’ll create a new job that will host our future data generation flow. Right click on ‘Job Designs’ item in the Repository view and select ‘Create job’.

Neo4j Connection

As a first component in our empty flow we want to introduce a Neo4j connection using the tNeo4jConnection component. This component can be configured with a local (embedded db) connection using a file path or with a remote connection using a url. We chose a local connection in Neo4j, version 2.2.x. This will currently result in a version 2.2.5 database.

Next step is to also close the database connection. For this task we can introduce the tNeo4jClose component and link this one to the connection with the ‘On Subjob Ok’ trigger. We could also use the “shutdown after job” flag on the last Neo component in the flow, but I like to have an explicit action to do this.

If you want to generate a big database in Neo4j, the remote connection will be too chatty (slow) and an embedded connection will provide huge performance benefits! The created embedded database can easily be attached to a Neo4j (Community) server.

Clear database (optional step)

I like to make this flow repeatable and therefore an additional task is to always clear the database. Let’s add a tNeo4jRow and write a “delete all” statement. Make sure to flag this component to use the existing connection.

As the Neo4j components in Talend have a slight version delay, we’re now on version 2.2.5 when using the embedded Neo4j database. This means we have to delete all nodes by executing a “match-delete” statement. In version 2.3+ we can write a “MATCH (n) DETACH DELETE n” statement that will automatically delete the nodes and all its relations, without the need to specify the latter.

Series Navigation<< How setup Neo4j demo data – intro