Skip to content

Learn about our organization's purpose, values, and history that define who we are and how we make a difference.

Who we are

why-we-are

Discover how the Mastech InfoTrellis ecosystem is enabling customers to make well-informed decisions faster than ever and how we stand apart in the industry.

Delve into our wealth of insights, research, and expertise across various resources, and uncover our unique perspectives.

Thrive in a supportive and inclusive work environment, explore diverse career options, grow your skills, and be a part of our mission to excellence.

Table of Content

Connecting MongoDB using IBM DataStage

MongoDB is an open-source document- oriented schema-less database system. It does not organize the data using rules of a classical relational data model. Unlike other relational databases where data is stored in columns and rows, MongoDB is built on the architecture of collections and documents. One collection holds different documents and functions. Data is stored in the form of JSON style documents. MongoDB supports dynamic queries on documents using a document based query language like SQL.

This blog post explains how MongoDB can be integrated with IBM DataStage with an illustration.

Why MongoDB?

For the past two decades we have been using Relational Database as data store as they were the only option that was available. But with the introduction of NoSQL, we have more options based on the requirement. Mongo DB is predominantly used in insurance and travel industry.

We can extract any semi-structured data and load it to MongoDB through any of the integration tools. Also Extract from MongoDB is easier and faster when compared to relational databases.

MongoDB integration with IBM DataStage

Since we don’t have a specific external stage in IBM DataStage tool to integrate MongoDB, we are going with Java Integration stage to load or extract data from MongoDB.

Since MongoDB is a schema free database, we can use structured or semi-structured data extracted through DataStage and load it to MongoDB.

Prerequisites

  1. Make sure you have java installed on your machine.
  2. Install Eclipse tool.
  3. Java requires below MongoDB jar to be imported inside the package to use MongoDB functions
    • mongo-java-driver-2.11.3.jar or higher versions if available (Download it from the internet)
  4. Also, Java requires below jar file to be imported inside the package to extract or load data from DataStage
    • jar (It is available on the DataStage server. Location: /opt/IBM/InformationServer/Server/DSEngine/java/lib)

Illustration of a DataStage job

  1. Create a job in DataStage to parse the below sample XML

Illustration of a DataStage job

The XML contains one person information and one or more person name objects linked with the person.

Person XML linked to Person Name & Person Data

Figure 1: Person XML linked to Person Name & Person Data

2. In this job, link lnk_PersonInfo_in contains person information

lnk_PersonInfo_in contains person information

3. And, link lnk_PersonNameInfo_in contains person name information

lnk_PersonNameInfo_in contains person name information

4. In this Job, directly we can use Java Integration Stage to insert data into MongoDB for the Person information link

5. Develop Java code to load person data into MongoDB.

Data into MongoDB

The Result in MongoDB after inserting person information:

Result in MongoDB

6. Create another job to load PersonName information into MongoDB though Java Integration Stage

Loading Person Name info into MongoDB

Figure 2: Loading Person Name info into MongoDB

7. Below is the Java code to update PersonName information for the respective _id’s

information for the respective _id’s

8. To Integrate java code in DataStage

  • Export the java code as a jar file (LoadParty.jar)
  • Place the LoadParty.jar and mongo-java-driver-2.11.3.jar in the DataStage server at any location.

9. Configure the Jar files in Java Transformation stage

  • Java Transformation stage used to load Person Data informationJava Transformation stage used to load Person Data information
  • Java Transformation stage used to load Person Name Data information

Java Transformation stage used to load Person Name Data information

10. Final result in MongoDB:

Final result in MongoDB

Conclusion

Currently there is no external stage for MongoDB in DataStage. Extract/Load from MongoDB in DataStage would become simpler if there is any external stage introduced in future.

avatar

Muthulakshmi P

Associate Consultant

Muthulakshmi P, an Associate Consultant, brings her expertise to drive strategic solutions, enhance efficiency, and provide valuable insights.