Every now and then we want to demonstrate a technology or solution with a Proof of Concept. This series discusses how we create Neo4j demo data using Talend.

Proof of concept

Most of the time proofs of concept are  limited in time and/or budget so they require a fast setup. An important item of such a setup is having relevant data. Sometimes we have direct access to several (on premise) data sources, but it tends to be time consuming to integrate these. Furthermore,  having real data is not always a requirement.

What’s more interesting is getting knowledge of the domain so we understand the concepts and terminology around a case. Which entities are we talking about and how are they related to each other. Once we have this (high level) insight we can create a dummy dataset ourselves and get rid of the integration bottlenecks.

Neo4j

Neo4j is a world’s leading open source graph database. This graph database stores the data as nodes, relationships and properties. This is comparable to a rdbms database, which stores data into rows, columns and foreign keys. A big difference is that “relationships” in a graph database is a first class citizen and querying the graph through those relations (traversing) is very powerful.

Visit their website

Talend

The first reflex for a developer for custom development can be the to generate dummy data, but this might not be the fastest nor the most flexible way to handle this task. Recently I had to generate some demo data into a Neo4j and used the free ETL tool Talend Open Studio for Big Data Integration. Talend offers a very visual (drag and drop) interface with loads of components providing data source inputs (e.g. xml files, restservices, sql databases,…), transformations and outputs. Luckily for us straight out of the box a few Neo4j components are already available.
Visit their webiste

Series

Throughout this blogpost series we’ll going to cover following topics:

  • Part 1 – Introduction (this post)
  • Part 2 – Setup Neo4j and Talend
  • Part 3 – Adding Indexes
  • Part 4 – Time Series
  • Part 5 – Userprofiles
  • Part 6 – .Net Transformations
  • Part 7 – Random relations
Series NavigationConfigure talend to setup neo4j demo data >>