Table of Content
TABLE OF CONTENTS
The evolution of customer Master Data Management
Philosophers asked the question thousands of years ago, "What is a person?" It seems simple, but the answer can be extremely complex in an ever-changing business environment. Today's businesses and information technology organizations are left with the unenviable task of answering this and other similar questions. Success leads to rapid, agile processes that drive better outcomes. Failure contributes to compromised processes, and that can negatively impact outcomes.
It has always been easy to talk about what a person conceptually is. What is a business? What is a product? However, with the advent of high-scale, structured (and even semi-structured) data environments and architectures, the conceptual definitions need to become black-and-white rules that could be implemented with zero room for interpretation.
This concept of the persistent, immediate identification of a person, a business, an entity, or a relationship is the heart of all Master Data Management, and all Master Data Management, by extension, is the core of any data architecture.
Why Modern Businesses Need Master Data Management (MDM)
Businesses traditionally have a limited view of the people and other business entities they encounter. Collection and management of critical domain master data typically revolved around a small set of well-understood B2C and B2B entities – people, businesses, locations, products, etc. generated from/through finite entry points into the master data environment. As enterprises increased their desire to manage critical data holistically, this led to creating processes to manage this data at an enterprise level - master data management. Leveraging master data became increasingly complex due to companies collecting, processing, and storing this data across multiple processes and systems. While companies may have a good base level of master data (e.g., customer name, address, contact info), the exact attributes collected, associated data quality, and data duplication vary from system to system.
Most early efforts implementing Master Data Management processes revolved around extracting known data from disparate systems, standardizing and improving the data quality, merging and curating the resulting entities, and distributing the managed data back to the enterprise for more efficient and accurate execution of customer business processes.
Software vendors took note, which led to the creation of Master Data Management software platforms (i.e., MDM Platforms) and more specialized data domain solutions (e.g., Customer Data Platforms). These platforms have become popular since the early 2000s and typically provide an integrated set of MDM functions with a heavy focus on data ingestion, standardization, cleansing, matching, curating, and storing set data entities.
Depending on the specific platform, they may provide integration and distribution functions. Still, these typically only work within a range of operational requirements, and larger enterprises typically need to pair them with more sophisticated data engineering and integration solutions. Additionally, these platforms offer varying flexibility on what entities (and attributes) are supported and what associated entity relationships can be ingested, discovered, and managed. While they can offer quick Master Data Management "solutions-in-a-box," they fall short of end-to-end enterprise solutions.
As companies began implementing their initial MDM Platform solutions and Master Data Management processes, the demand for complex business requirements grew. Simple management of known defined master data became the entry point for Master Data Management. Businesses started pushing IT departments to manage increasingly complex entities and relationships. It was no longer enough just to have a clean, unified view of known customer data; it became imperative to provide contextual collapse services (i.e., business unit-specific and situational entity collapse and composition logic). In addition to increasing complex entity types, requirements surfaced around managing complex relationships (e.g., households and social connections). Additionally, operational requirements increased dramatically. Batch data flow requirements evolved to near real-time event pub/sub demands, and eventually, large-scale real-time interface demands became the norm.
Our thesis: MDM software platforms are a tool in the broader arsenal – they are not the answer. Master Data Management is a process that includes multiple tools, platforms, rulesets, and conventions to bring order to the chaos.
Composable Architecture meets MDM
One of the most significant evolutions in today's data architecture and strategy is the lean toward Composable Architecture – a fluid ecosystem that contains independent systems and components that communicate with each other with the help of APIs. Gartner breaks down Composable Architecture into three parts:
- Composable thinking: Continuous development of new business capabilities
- Composable business architecture: Constantly re-assessing people, processes, and capabilities
- Composable technology: Modular, dynamic component assembly and re-assembly
The key is flexibility. Under the new focus on Composability, there are no "set-in-stone" decisions. Processes, tools, and connectivity points are modular and flexible, which, by definition, makes them perishable. As a result, the supporting MDM capabilities and software that provide the necessary linkage within and around the superstructure must also be flexible, modular, and conscious to learn from new connectivity points.
Today's next-generation Master Data Management solutions are rapidly evolving to meet these new requirements. Vendor MDM Platforms are incorporating more sophisticated storage strategies typically encompassing multiple data store technologies (e.g., relational, graph, NoSQL, index Search) to accommodate entity/relationship flexibility and evolving to more sophisticated container-based platforms to support increased operational requirements.
Additionally, enterprises are incorporating vendor MDM Platforms into larger master data and integrated solutions involving multiple components dedicated to various aspects of the master data value chain. In today's world, enterprises must think of master data management as a dynamic, flexible process involving multiple components and not an MDM hub based on a single software product implementation. Enterprise master data processes represent multiple functions such as data ingestion, standardization, quality remediation, identity resolution, storage, stewardship, publication, and distribution. In addition, solving for multiple MDM domains that spread across multiple Geos often leads to multiple aggregated hubs and sophisticated data distribution solutions that isolate and bring data closer to the edge where decisioning occurs.
Early wins in the Composable Architecture/MDM marriage
Of note is MDM Platforms combined with multiple downstream custom master data hubs. Combining these components within enterprise data governance and integration architectures can provide tremendous benefits. Data can be fed into these downstream hubs via existing MDM solutions and secondary direct data sources (e.g., marketing prospect data, web interaction history, or social media data). Of particular importance to modern master data solutions are the data integration and data governance service layers. One advantage of creating a custom master data hub is that it allows an enterprise to closely tie master data functionality into broader enterprise IT strategies and architecture standards. Controlling the data integration interfaces allows a company to abstract vendor MDM Platform interfaces and rapidly deliver specific services for the downstream consumption of master data. It also allows for rapidly creating interfaces for newer data entity types and complex party profile objects.
Ultimately, the goal is to deliver an Enterprise Data Fabric that participates in an event-driven sense and response backbone, providing efficient, accurate, and timely data that enables critical business processes and decisions. This solution allows an enterprise to use vendor platforms to manage more well-known entity types (e.g., customer) in combination with built-in data quality features (e.g., address standardization) and complex data steward functionality. The use of a secondary custom master data hub allows an enterprise to provide a more rapid response to changing business requirements by combining traditional known customer data with other types of party data, creating more complex flexible entity types.
Entity resolution and complex party profiles
One primary requirement of any modern Master Data Management part domain solution is to take an inbound query with limited information and map it to the existing stored master data. This function is used both for the ingestion and distribution of data. Traditional MDM platforms typically perform this function using a structured matching approach. Although quite effective at matching two known data records (e.g., Customer, Product) or even matching an inbound query with specific search attributes to known record types, it can be less effective when trying to link secondary unknown entities with limited attributes to known data types or providing real-time non-obvious relationship information (i.e., does this person have a social connection to an existing known customer).
There are two essential requirements to providing the ability to support this type of complex query mapping functionality. The first is entity resolution, which, in its simplest form, is the process of identifying an entity. Operationally, entity resolution matches multiple profiles for the same entity (i.e., customer) within and across systems and eliminates duplication.
Conventional techniques for entity resolution involve applying deterministic and probabilistic matching rules on multiple records of an entity within a business line to create a master record for that entity. However, these methods have their shortcomings. For instance, they may be restricted to matching based on exact ID numbers, such as Social Security Numbers, or their accuracy may be lower than desired, even when fuzzy matches are used for fields such as names and addresses. Moreover, these techniques require considerable configuration efforts and frequently encounter performance and scaling issues.
Advanced entity resolution is a strategic asset that surpasses traditional approaches using a golden profile for entity resolution rather than just a golden record. It generates a 360-degree profile in real time that provides an entity-centric view of each object and its relationships.
The entity resolution process is automated and involves several steps, such as standardizing, normalizing, validating, enhancing, and enriching data. Input records from various sources are processed to form feature sets that define the entity and generate additional or inferred features. These features are used to identify direct and indirect relationships of the entity and establish a contextual understanding.
The framework of blocking-matching-clustering is then used to identify candidate profiles for matching from a pool of resolved entities and match and link the entity to the closest cluster. The matching process uses pre-built advanced algorithms and open-source reference libraries for high-precision feature matching. Clustering generates groups of similar and linked entities, thus identifying duplicates/matches across sparse and heterogeneous user profiles.
Machine learning algorithms leverage probabilistic methods to create the entity profile and resolve identities progressively. The process successfully handles typographic, informational, and temporal variations, improving its accuracy and completeness with time by self-correcting and reducing false positives and negatives.
This incremental framework also allows additional third-party data sources to enrich the profiles further with information about beneficial owners, hierarchies, and risk profiles. The enriched contextual awareness and complex relationship information added to primary demographic data can improve the accuracy and completeness of identity matching.
The incremental resolution framework is flexible and continuously procures the latest updates of the entity's profile from its data sources, which are added to existing resolved profiles. The process assigns an enterprise-wide persistent ID to each entity profile and links all identifiers across different LOBs through a "digital keyring model."
The second requirement involves creating, managing, and storing multi-dimensional profile entities that provide the base information for progressively building and managing an entity's data and all associated known relationships. This type of profile provides the flexibility to respond to business requirements and provide complex contextual entity-type service interfaces. For example, enterprises want to know the person involved in any encounter and the person within the context. i.e., responding with customer profiles that differentiate between John Smith on the company's website and John Smith using his phone app. The more master data solutions can customize entity interfaces to provide core, extended, and contextual data about a party in real time, the more effective enterprise responses and actions will be.
Integration and operational requirements
Changing master data business requirements has also led to changing operational or non-functional technical requirements. Integration with enterprise data engineering, integration, and governance infrastructure is a growing requirement, along with support for batch, near real-time, event publishing, and interfaces. In large enterprises, these requirements can come with very stringent SLAs, response, and throughput requirements. While MDM software products can typically satisfy some requirements, MDM solutions must often be decomposed and implemented within larger enterprise architectures.
To satisfy these requirements, attention must be focused on multiple levels. Architecturally, asking system components to satisfy multiple differentiated workload types can lead to bottlenecks and unsatisfactory results. In more extensive master data solutions, separating component functions (e.g., ingestion of complex legacy data, data quality remediation, data stewardship user interface functionality, complex entity resolution/definition, real-time OLTP, and Query processing) into aligned component groupings can lead to better results. This was one of the initial motivations for solutions that separate data distribution functions into downstream data hubs, leaving curation ingestion/cleansing/curation functions to MDM platforms.
Care must also be given to the associated technical implementation of individual components within the broader master data solution. Appropriate service-style architectures should be utilized for components that handle integration, OLTP, and query processing duties, with preference given to container-based microservice implementations. In addition, the data storage implementation of any MDM platform and data hubs will always be one of the most critical components within the larger solution. The challenge with modern master data solutions is that the type of data stores that best support complex/flexible data entities (i.e., graph-based databases) can be a challenge to implement in a manner that supports a large enterprise's rigorous SLAs and non-functional requirements. Current best practice calls for deploying multi-storage type subsystems that support complex flexible data models (i.e., typically a graph data store) and higher operational requirements (some combination of NoSQL, Index Search, and Relational datastore). Care should be given to all layers (e.g., physical, network, software) to achieve high-demand requirements.
Also of note, data store subsystem implementations should be abstracted through an integration layer with published service interfaces to allow flexibility in the underlying implementation (i.e., consumers of MDM platform and Data Hub data and services should only use the published integration interfaces and not directly access the underlying data stores).
Conclusion
Enterprise MDM process requirements have evolved to require sophisticated multi-component solutions that work within broader enterprise architecture standards, especially in Composable Architecture. Essential MDM functions such as data ingestion, standardization, quality, stewardship, distribution, and entity resolution can be provided by separate or selectively integrated components. Composing MDM solutions in a flexible, efficient architecture can be challenging, but the payoff is the basis for a well-engineered data fabric that enables business decisioning. Understanding the broader picture and how MDM solutions can be built over time will avoid duplication and enable better business results.
Michael Ashwell
VP and GM Data Management, Mastech InfoTrellis
Michael is a seasoned professional with over 35 years of experience in enterprise architecture, solution development, cloud offerings, global sales, and consulting. He spent 30+ years at IBM where he held various roles, including leading the Data and Analytics Lab Services Cloud COE, and developed several key offerings.