MongoDB is an open-source document- oriented schema-less database system. It does not organize the data using rules of a classical relational data model. Unlike other relational databases where data is stored in columns and rows, MongoDB is built on the architecture of collections and documents. One collection holds different documents and functions. Data is stored in the form of JSON style documents. MongoDB supports dynamic queries on documents using a document based query language like SQL.
This blog post explains how MongoDB can be integrated with IBM DataStage with an illustration.
For the past two decades we have been using Relational Database as data store as they were the only option that was available. But with the introduction of NoSQL, we have more options based on the requirement. Mongo DB is predominantly used in insurance and travel industry.
We can extract any semi-structured data and load it to MongoDB through any of the integration tools. Also Extract from MongoDB is easier and faster when compared to relational databases.
MongoDB integration with IBM DataStage
Since we don’t have a specific external stage in IBM DataStage tool to integrate MongoDB, we are going with Java Integration stage to load or extract data from MongoDB.
Since MongoDB is a schema free database, we can use structured or semi-structured data extracted through DataStage and load it to MongoDB.
- Make sure you have java installed on your machine.
- Install Eclipse tool.
- Java requires below MongoDB jar to be imported inside the package to use MongoDB functions
- mongo-java-driver-2.11.3.jar or higher versions if available (Download it from the internet)
- Also, Java requires below jar file to be imported inside the package to extract or load data from DataStage
- jar (It is available on the DataStage server. Location: /opt/IBM/InformationServer/Server/DSEngine/java/lib)
Illustration of a DataStage job
- Create a job in DataStage to parse the below sample XML
The XML contains one person information and one or more person name objects linked with the person.
Figure 1: Person XML linked to Person Name & Person Data
2. In this job, link lnk_PersonInfo_in contains person information
3. And, link lnk_PersonNameInfo_in contains person name information
4. In this Job, directly we can use Java Integration Stage to insert data into MongoDB for the Person information link
5. Develop Java code to load person data into MongoDB.
The Result in MongoDB after inserting person information:
6. Create another job to load PersonName information into MongoDB though Java Integration Stage
Figure 2: Loading Person Name info into MongoDB
7. Below is the Java code to update PersonName information for the respective _id’s
8. To Integrate java code in DataStage
- Export the java code as a jar file (LoadParty.jar)
- Place the LoadParty.jar and mongo-java-driver-2.11.3.jar in the DataStage server at any location.
9. Configure the Jar files in Java Transformation stage
- Java Transformation stage used to load Person Data information
- Java Transformation stage used to load Person Name Data information
10. Final result in MongoDB:
Currently there is no external stage for MongoDB in DataStage. Extract/Load from MongoDB in DataStage would become simpler if there is any external stage introduced in future.