AI can streamline unstructured data ingestion
Mastech InfoTrellis clients frequently complain that their understanding of their business is hampered by the fact that much of their vital data resides as ‘dark data’ silos that don’t pass through or touch the traditional IT infrastructure (system data).
There are mountains of dark data sitting on people’s laptops, desktops, servers, and other corporate assets that go unused because they are embedded in cumbersome formats, often requiring a lot of manual intervention to load this information into modern digital data stores. These data include PDF files, excel files, text files, log files, image files, etc., that are routinely generated in everyday business operations. These data can often provide valuable context and may be critical to driving insights around the actual business processes themselves.
Mastech InfoTrellis developed the Smart Ingestion solution to tackle this silent, but critical, issue that has long prevented many organizations from operationalizing all their data.
Smart Ingestion uses a proprietary SEED ontology that gives the solution the intelligence required to read documents having various formats – essentially, a “universal” ingestion engine! The engine facilitates the quick ingestion of data from structured, unstructured, semi-unstructured files that can be loaded into a data store for consumption in near-real-time.
Starter ontologies provide the backbone for building domain-specific Knowledge Graphs that capture business language, rules, and nuances. These ontologies are used to accelerate graph construction and provide a starting point for graph intelligence. Over time, this will evolve to capture all the facets of a business across an ever-expanding network of relationships—revealing hidden patterns among data elements that can be accessed visually. The deeper the graph becomes, the more relationships among data elements are identified, providing richer insights to create new opportunities to connect with your customer.
Smart Ingestion is the cornerstone of the Mastech InfoTrellis approach to streamlining and orchestrating the ingestion process, as part of the larger objective of building a comprehensive Enterprise Knowledge Graph that generates near real-time Enterprise Intelligence.
Unlock the potential of data
Traditional ETL methods and tools make use of a template to extract data from structured fields found in these documents. The problem with templates is that if the underlying information changes even a little, the template becomes outdated and can no longer be applied. For real-world data ingestion, such manual techniques are unscalable and often very costly. It can take anywhere from hours to weeks to design templates for dark data ingestion, and due to the unregulated nature of the underlying data, there is likely to be a high degree of ongoing change in these templates as the underlying data formats change due to factors beyond normal control.
The smart ingestion process is built into the Enterprise Data Bus that is at the heart of the Enterprise Intelligence Hub so that data from a variety of sources can be ingested into the Data Ocean, and made available to a host of machine-learning-based curation algorithms (see Data Quality Intelligence services) that deliver curated data to the data lake (or equivalently a data vault) for consumption by downstream analytics applications. These data products are then assembled into Knowledge Graphs using the underlying level-0 ontologies (“starter” ontologies) extracted from transactional data by the smart ingestion process. These starter ontologies also serve as the backbone of an enterprise knowledge graph that is rich with context and insights. These starter ontologies persist in the EDB Ontology Store which are then combined them with other enterprise data products to generate knowledge graphs for various applications – from IT services to business reporting, analytics, and data science models.
The Mastech InfoTrellis Smart Ingestion solution can significantly reduce the time, cost, and effort otherwise involved in the manual data ingestion process. Clients realize their returns on Data Science investments in the quickest way possible.
It’s now time to realize the value inherent in all data, not limited to data in IT systems.
Smart Ingestion Service Offerings
- Assessment: Apply Smart Ingestion to a representative sample of data to demonstrate the feasibility, understand the variances, and deliver a detailed production proposal and roadmap.
- Implementation: Configure and deploy Smart Ingestion in production to deliver required outcomes. The magic of smart ingestion is that “it just works!”
- Managed Service: Manage and continuously improve Smart Ingestion capability performance while tacking on new (or more complex) business use cases.