As your enterprise moves toward being data driven, the ability to derive a domain ontology from your company’s data will become ever more important. In order to move to this deep analytical process, it is important to understand the amount of data required and the state the data must be in before you should attempt any deep analytics studies.
Data – Where is it?
The first need is to discover where all of the domain data for your enterprise actually exists. Some is self-evident such as your ERP, CRM, SFA and other internal sources. Some other sources such as social media (Instagram, Facebook, twitter, yelp, etc.) come to mind. However, there are also private sources such as lexisnexis.com, cencus.gov and movoto.com that can be harvested as well as web site data from Google, Bing and others. Some of the website data such as transaction logs and http logs are lesser known but can be just as important to have on hand.
The trick is to know where your data resides, retrieve this data and store it in an area where it can be modeled and acted upon.
Have the data – Now what?
Just because you have collected as much data as you can about your organization does not mean it is necessarily ready for use. The old adage garbage in garbage out is certainly true here and can be magnified if you are letting the computer make decisions based on ontologies derived from this data.
So, an extensive cleansing and deduplication effort needs to be performed on the raw data in order to make it reliable and usable for any deep analytics study. This leads us to the need of building out the data sources in stages, an ideal use for the MIT Enterprise Data Bus.
The process to develop reliable data that can be trusted as a source for deep analytics and decision centric studies such as ontologies is as follows:
Find the data, extensive research must be performed on potential sources of your enterprise data and that data retrieved into a storage source that can handle massive amounts of data, the “Data Ocean”. You will need to attach meta data to the files in order to keep up with what they contain.
Once you have the data in its raw form brought into the data ocean, a number of cleansing, deduplication and business rules must be applied to the raw data to make it reliable and available for use in analytics processes. This ocean of data is then available for use in data lake creation, big data operations, Machine Learning programs or Artificial Intelligence.
Becoming a data driven enterprise is a journey and some initial investment must be made in order to gain long term benefits. Most companies do not have the infrastructure or governance in place to fully utilize the corpus of their data, but with planning and a disciplined approach to that management and engineering of your data, long term benefits can be achieved, and data can drive a competitive advantage for the enterprise.
What the heck is an Ontology?
Ontologies are a deep, almost existential, understanding of a domain. In the case of business this equates to having insights into very complex elements of a particular domain. This could include such domains as customers, products, manufacturing, supply chain, or any other domain that is key to an enterprise’s success.
The thing about ontologies is that they need very trustworthy data in order to come up with the best analysis of a domain. Because you are letting the computer develop insights the business will have to act on, you want those insights to have the best possible data to work with.
So, the data is the thing. The better data you have, the better your ontologies and the better your outcome when you act on those ontologies to gain a competitive advantage over your competitors.