Skip to content

Learn about our organization's purpose, values, and history that define who we are and how we make a difference.

Who we are

why-we-are

Discover how the Mastech InfoTrellis ecosystem is enabling customers to make well-informed decisions faster than ever and how we stand apart in the industry.

Delve into our wealth of insights, research, and expertise across various resources, and uncover our unique perspectives.

Thrive in a supportive and inclusive work environment, explore diverse career options, grow your skills, and be a part of our mission to excellence.

Table of Content

Informatica MDM

This blog touches upon the basics of Informatica MDM Fuzzy Matching.

Informatica MDM – SDP approach

A master data management (MDM) system is installed so that the core data of an organization is secure,  is accessible by multiple systems as and when required, and does not have multiple copies floating in the system in order to have a single source of truth. A solid Suspect Duplicate Process is required in order to achieve the 360-degree view of an entity.

The concept of Suspect Duplicate Processing represents the broad category of activities related to identifying entities that are likely duplicates of each other. Suspect duplicate processing is the process of searching for, matching, creating associations between, and, when appropriate, merging data for existing duplicate party records in the system.

To achieve this functionality, Informatica MDM has come up with its own Suspect Duplicate Processing (SDP) approach. An organization, based on its use case, can opt for any of the following two approaches:

  1. Deterministic Matching Approach
  2. Fuzzy Matching Approach

Deterministic Matching Approach

Deterministic Matching uses a series of rules like nested if statements, to run a series of logical tests on the data sets. This is how we determine relationships, hierarchies, and households within a dataset. Deterministic matching seeks a clear “Yes” or “No” result on each and every attribute, based on which we define whether:

  • Two records are duplicates
  • should be resolved by a data steward or
  • Two unique entities.

It doesn’t leave any room for error and provides the result in an ideal scenario. However most of the data in organizations is far from an ideal scenario. These are the cases when the Fuzzy Matching Approach of Informatica comes in handy.

Fuzzy Matching Approach

A fuzzy matching approach is required when we are dealing with less-than-perfect data to improve the quality of results. Fuzzy Matching measures the statistical likelihood that two records are the same. By rating the “matchiness” of the two records, the fuzzy method is able to find non-obvious correlations between data and hence rates the two records by saying how close they are to each other.

Informatica MDM fuzzy matching offers the above in an easy-to-configure, flexible, repeatable, and probabilistic manner. It gives us the flexibility to define which attributes are required to be matched deterministically (such as Country IDs) and which using the fuzzy logic (such as Names).

The fuzzy matching in Informatica works on different aspects of the data. The algorithm can be configured depending on whether we are catering our algorithm to match an Individual or a household, contact person or an organization, etc. This helps us to handle different scenarios in the data. Also, based on the understanding of the data, we can choose the strictness of the algorithm, not only in terms of matching but in terms of searching as well.

The main strength of Informatica MDM Fuzzy matching is that it is a rule-based matching system, and unless and until the match criterion is met, we won’t be getting a match, which makes it a business user-friendly matching system.

The match criteria can be defined into two categories,

  • Automatic Merge and
  • Manual Merge.

Automatic Merge is a scenario where the system by itself finds out that the two entities in question are duplicates, whereas Manual merge is a scenario where we need a Data Steward to decide whether two parties in question are duplicates or not. Based on the rule (Automatic or Manual) that is satisfied by a suspect pair, the fate of the pair is decided whether the records merge automatically or a task is created for a Data Steward. If none of the defined rules satisfy the suspect pair, then the two records are treated as two unique parties/entities.

The rule-based approach of Fuzzy logic makes it easy for Business Users and Data Stewards to identify what record patterns can constitute a duplicate pair. Thus making it a hit with Business Users and resonating the effect with the program sponsors by making the MDM implementation successful.

avatar

Ripudaman Singh Dhaliwal

Consulting Project Manager

Ripudaman Singh Dhaliwal is a skilled Consulting Project Manager with a track record of successfully delivering complex projects. With a sharp focus on project management, Ripudaman ensures efficient project execution, timely delivery, and client satisfaction.