linkedin
Things to Consider to Build Knowledge Graph

Strategic and tactical things to consider when building a Minimum Viable Knowledge Graph

By Upasana Pandey and Maria Singson, PhD | February 28, 2022
linkedin

In today’s environment, one does not have to be a particularly large organization to generate a ton of internal data. Interestingly enough, it is in large organizations that Data and IT teams have grown accustomed to structured, taxonomical data architecture and multi-systems storage. With multiple systems, data is highly prone to inconsistencies and duplication, especially as it is stored across different applications. It becomes, therefore, ironically harder to get to a trusted 360◦ view of customers and the business, and doubly challenging to make significant improvements to customer satisfaction since wrong messaging becomes more likely as data volume increases.

knowledge-graph

Still, gut reaction tells the traditional data engineer to create a single centralized master hub for storing the entire core customer data. Yet, with so many interconnections and nested relationships to account for and scale, there needs to be a paradigm shift in how such connections are linked and mined. And this is why Knowledge Graph’s time is now.

Knowledge Graphs integrate entities from multiple sources with their properties, relationships, and concepts in a network-like structure, i.e., every linkage is meaningful and contextual. This nature of knowledge graphs, therefore, gifts business leaders, analysts, and data scientists with a more holistic view of their business spanning from different levels of suppliers to products, productivity, and to customers simultaneously. Tertiary data, or those that may not be directly linked to customers or the business, can also be accommodated on the graph. This means traceability of insights, full accountability on the “why” of things, as well as the beginning of next-gen scenario simulations into the future. In other words, Knowledge Graphs facilitate enterprise intelligence.

puzzle

But the construction and maintenance of decently intelligent Knowledge Graphs need to be demystified. Consider the challenges and resourcing needs for the following:

1. Understanding the multiple source systems where the data currently reside

Taking inventory of not just the data but also their multiple source systems is the most important step in data integration towards knowledge graphing. On the one hand, there is the pure mapping out of the underlying structure of each source system and the types of data residing in it – but this is only half the foundation. Business requirements and metrics must be simultaneously gathered and vetted as they are the very determinants of how the data will be used in the first place. The combination of business requirements and source system mapping would then fuel the scope for the Knowledge Graph construction. The sooner this is done, the better, so that the right contextual schema can be written. These are not just tasks to be checked off on a project plan in a vacuum. It is actually a highly collaborative and communicative effort across teams. Naturally, when business needs and the use of data are discussed, Analytics (yes, Analytics) must be discussed. Analytics is the catalyst to insights within the data, while the understanding of the existing data systems ensures implementable business solutions as an outcome. This is why, at Mastech InfoTrellis, we have Analytics Advisors who have multiple years of experience building analytics solutions at scale for Fortune 1000 organizations. They ensure that pitfalls are avoided when designing databases and architecture whose main internal consumers are analytically-minded use cases.

2. Designing the graph ontology – the real source of competitive advantage

What makes one Knowledge Graph smarter than another is how its data is contextualized. This is where Ontologies come in. For knowledge graphing, in particular, ontologies are sets of vertices and edges that map data attributes to their relevant schema. With vertices representing real-world entities, and edges representing relationships between those entities, ontologies instantly inject a comprehensive context to a graph, making it easy to access hidden interactions like never before. Even more impressive are the self-learning properties that these ontologies could have (depending on the AI/ML well versed-ness of the analytical resource building the graph). If done properly, the Knowledge Graph’s Entity Relationship schema can go unmanned and be self-patching for a long time, evolving as it draws more and more linkages across data domains.

Ontology

Some practitioners wait until the business requirements and scope related to the Knowledge Graph are ready before they think of how they are going to inject context into the graph. However, to gain the most competitive advantage, ontologies should be designed simultaneously. Especially when plenty of domain expertise is required, ontologies can actually make business requirements more succinct, thereby streamlining the data requests for a given business problem. They guide data collection and engineering right down to choosing the correct tech stack and graph software. This, in essence, is why we have Ontology Design and Knowledge Graph Readiness Assessment as key offerings in our Data Science Kiosk. To immediately reap the benefits and ROI of knowledge graphing, an organization must be conscientious of its analytical talent and its analytics engine (which is really data architecture), not just its data management.

3. Data Profiling

Once data floods into the Knowledge Graph, data profiling needs to happen to quickly evaluate the quality and content of the data to ensure its integrity, accuracy, and completeness. Often times this is a validation/quality assurance exercise to verify that predefined rules and standards are preserved and discover anomalies, if any. Data profiling is especially important when the data is gathered from multiple source systems, and we want to make sure that quality and consistency are not being compromised during the transfer.

Rover

To perform this process effectively, we have developed a data profiling bot known as “Rover,” which is extremely useful in examining as well as collecting statistics or informative summaries about the database/file it is analyzing. Rover plays a crucial role in any data-intensive project to assess the data quality and improve the accuracy of data in corporate databases.

For Knowledge Graph construction, in particular, Rover not only quickly validates what’s in the database but also helps test out the pre-designed ontologies to make sure that they can be flooded with ample data once the graph is built. He makes the stakeholders aware of whether their data would be enough to create a Minimum Viable Graph (MVG) to support their analyses and AI initiatives. Last but not least, Rover also exemplifies the kind of automation that can be built on top of MVGs; this is why he lives virtually in the Data Science Kiosk, which is powered by our AI Accelerators (Ontology Design, Smart Ingestion, Entity.ai, Smart Data Prep Assistants, Feature Miners, and Smart Storytellers).

4. Integrating your Knowledge Graph “insights” back to business as usual

This last point is an important one. For Knowledge Graph insights to be useful (and worth it), they should be easily foldable back into one’s current environment. Let’s face it – knowledge graphing is not an all-or-none undertaking, but an evolution. That’s why the concept of an MVG is important, which is also why the first set of ontologies designed is pivotal. If the first ontologies and MVG do not address the right use cases, it could be a long waiting game for impatient stakeholders. If the ability to extract insights, or if the partial build of a graph actually impedes BAU processes, or (worse) if integration with BAU systems is expensive/impossible, the buy-in may not come easy. What good are such insights if not exercisable? These are the kinds of considerations our Knowledge Graph Readiness Assessment would expose.

Conclusion

The biggest advantage of building Knowledge Graphs is that it provides us with a unified view of customer and enterprise data on a global schema that captures the interrelationships between the data items represented across multiple databases. It helps gain important insights about customers and comes with numerous applications for multiple business use cases. When done right, Knowledge Graphs are the portal to true Customer360 and 5G-ready hyper-personalization. When done wrong, inefficiently or not, assigned their proper resource, Knowledge Graphs can become a grand science experiment with not much ROI to show for the effort. Is your organization Knowledge Graph-Ready?

Upasana Pandey and Maria Singson, PhD
Upasana Pandey and Maria Singson, PhD

The biggest advantage of building Knowledge Graphs is that it provides us with a unified view of customer and enterprise data on a global schema that captures the interrelationships between the data items represented across multiple databases.