Table of Content
TABLE OF CONTENTS
A data warehouse is a central repository of data and information businesses can use to analyze and make informed decisions. These data can come from in-house applications and databases regularly and are accessed by various people depending on their requirements. Various business intelligence systems and analytic solutions access these data and give decision-makers meaningful insights. While transactional database systems (OLTP) systems enable the real-time execution of many transactions across multiple databases, they may not be more suited for sizeable analytical processing. Data warehouses are best suited for business intelligence and reporting use cases as they offload the analytical processing from transactional databases and provide faster processing of large volumes of data through various data storage modes.
A simple architecture of a data warehouse is below:
Businesses deploy data warehouses in two methodologies:
1. On-premises (on-prem) deployment
2. Cloud-based (SaaS) deployment
On-prem data warehouses
In an on-prem data warehouse model, the customer is entirely responsible for purchasing, deploying, and maintaining the hardware and software. The on-prem data warehouse is aptly named as it resides within the customer's data center, ensuring its physical presence at their premises. The customer will have complete control over the security aspects of the warehouse. This is applicable starting from perimeter security to prevent any physical damage up to securing the data stored in the warehouse using appropriate encryption tools.
Some of the commonly used on-prem data warehouse products are:
1. IBM Integrated Analytics System
2. Pivotal Greenplum
3. Teradata
Benefits
Some of the common advantages of on-prem systems include:
1. Control – The organization will control the entire data warehouse operations. This includes:
-
End-to-end software technology stack, including all customizations and configurations that suit their requirements.
-
The type of hardware to buy - commodity servers or purpose-built storage and networking components, power supply, backup options, etc.
-
Physical access to a data warehouse in case of any failure for inspection and troubleshooting of any component – be it at the hardware or software layer.
2. Performance – The network latency between various components in an on-prem data warehouse will be relatively less. This will lead to a marginal increase in the speed or performance of the application. However, other factors affect latency, and just being on-prem does not guarantee good speed or performance of the application.
3. Governance – Since the data warehouse is located on a customer’s premises, requirements around data governance and compliance are much easier to achieve. Regulatory requirements like GDPR or CCPA are easier to implement as you know where the data is precisely located.
Challenges
Some of the drawbacks of on-prem systems include:
1. High upfront cost – A price is involved for setting up an on-prem warehouse, including the hardware and software cost and physical building with all required systems to keep the center up and running. This is also exacerbated by hardware depreciation, recurring maintenance costs for support personnel, etc.
2. Need for support from the team – There should always be a support team that is primarily responsible for keeping the systems up and running efficiently. This team includes administrators and engineers at different levels – network, system, database, and application.
3. Cannot quickly scale up or down based on business need – Rapidly adjusting resources to accommodate unexpected surges in activity poses a significant challenge for data warehouses.
Cloud data warehouse
Businesses are moving towards a Cloud-based data warehousing model to leverage Cloud providers' advantages, leading to a new service model called Data warehouse-as-a-Service (DWaaS).
Some of the facts that corroborate this include:
• Research Nester published a report titled "Data Warehouse as a Service (DWaaS) Market: Global Demand Analysis & Opportunity Outlook 2031" – which states that the DWaaS market is estimated to grow at a CAGR of ~22% during the forecast period 2022 – 2031.
• The DWaaS market size was valued at USD 4.26 Billion in 2021 and is projected to reach USD 29.52 Billion by 2030.
Some of the popular Cloud data warehouses include:
1. Snowflake – Can operate across multiple Cloud providers
2. Google BigQuery
3. AWS RedShift
4. Microsoft Azure Synapse Analytics
Benefits
Some of the essential benefits offered by Cloud warehouses include:
1. Pay-as-you-go pricing model – There is no upfront cost involved in setting up a data warehouse, and the pricing model almost always depends on the usage of services. This means no capital expenses for organizations and only running operational costs, thereby saving the TCO.
2. High scalability – Cloud solutions are scalable based on business needs, and handling colossal capacity or volume of data will never be an issue. The resources will be added and removed based on the load and can be customized by organizations. For example, the organization can set up a rule to increase computational resources by 80% over the weekend to account for a promotional sale on its website. And another rule is to delete the newly added resources once the sale is completed.
3. Quick time to market – Since there is no upfront cost to build the warehouse, organizations can quickly deploy their application and gather business insights, increasing their time to market.
4. High availability – Most services provide at least 99.9% data availability. This is coupled with high durability and reliability as the data is stored in multiple data centers across different regions.
5. Security – While security is a common concern towards moving to the Cloud, most providers invest heavily in security aspects and have various mechanisms to ensure the data is safe. This includes encryption of the data stored in disk (encryption-at-rest) using multiple options, encryption in transit using SSL, diligently following various security certifications like SOC 2, etc.
Challenges
Despite all the advantages offered by migrating to the Cloud, organizations face specific challenges, some of which are listed below:
1. Most Cloud services expect the users to be aware of their responsibility in using them. Some critical features like security in the Cloud, cost management, and user access control depend entirely on how the organizations configure and use such features. Simply put, the security of the Cloud is the Cloud provider's responsibility, and security in the Cloud is the customers' responsibility.
2. Based on usage, the dynamic pricing model of Cloud services presents a challenge for organizations once they adapt to this flexible cost structure.
3. Deploying hybrid architectures that need high interoperability and customizations is difficult. Some of the architectures would have used highly customized or legacy software. Cloud services cannot provide fine-grained customization, and organizations must follow different approaches to deploy such architectures.
4. Contractual obligations and technical challenges to change the Cloud provider if required.
Key differences
Here are some key differences between on-prem and DWaaS data warehouses based on common criteria.
Criteria | On-prem data warehouse | Cloud data warehouse |
---|---|---|
Cost | Upfront capital expense required. | No upfront capital cost whereas operating expenses will be incurred. |
Cost depreciation over time. | Variable monthly cost depending on the usage. | |
Regular maintenance of hardware. | No maintenance overhead | |
Need dedicated support personnel. | Premium support cost is usually required for critical applications | |
No monthly costs. | ||
Scalability | Highly rigid. Any changes to hardware or software requires heavy IT time and effort to execute. | Highly elastic. Resources can be added or removed on the fly without any manual intervention based on the application load. |
Time to market | Higher go to market time as the infrastructure needs to be built first. | Less go to market time. Businesses can build their applications quickly and deploy to get user feedback without worrying about the infrastructure. |
Built in ecosystem | In on-prem environment, organization should build all applications for security, user management, monitoring, notifications, analytics, etc. | All cloud providers offer services for security, user management, analytics, etc. so the organization need not go to different software for each use cases. |
Security | Security is sole responsibility of the organization and in-house IT team deployed to maintain the on-premises data warehouse. | Security is a shared responsibility between cloud service providers and organizations. All necessary security features are available in cloud and should be implemented by the organization. |
Conclusion
Most organizations consider DWaaS an integral step in their architecture landscape, considering the savings in cost and effort that Cloud solutions offer. However, on-premises data warehouses are optional. Some industries have highly customized or niche legacy use cases running on their on-premises for decades, for which Cloud support may need to be higher. So, it is up to organizations to assess their technology landscape and roadmap and identify which is best suited for their interests.
Mastech helps customers understand their business requirements and guide them to build the right data warehouse on-premises or in Cloud that they can use to unlock important insights out of the data.
Tags
Data-as-an-Asset
Siddharth Jothimani
Director, Data Engineering
Siddharth Jothimani, Director of Data Engineering at Mastech InfoTrellis, is a visionary leader. With a proven track record of driving innovation and excellence in data engineering, Siddharth is pivotal in shaping data-driven solutions for the organization's success.