Renowned for its expertise in CPU and semiconductor manufacturing, the client sought to modernize its on-premise environment by migrating its Human Capital Analytics (HCA) environment to a Microsoft Azure-powered cloud platform with Databricks and Snowflake. With Onix’s support, this company completed this transition in a period of just 12 months.
About the customer
Based in Santa Clara, California, our client is a leading name in the business of manufacturing computer processors and semiconductors. Established in 1968, this technology company has pioneered innovations in the field of CPUs, GPUs, networking, and AI platforms.
The challenge
This technology company faced a host of challenges with its on-premises HCA environment built on MS SQL Server. Some of these legacy challenges included:
- Incremental data ingestion of XML files (with inconsistent XML tags), causing issues when merging the delta data into the main table.
- Inconsistent behavior of Databricks’ Explode function with combinations of special characters, resulting in incomplete data extraction.
- Failure with multi-line data ingestion into Databricks’ tables.
- Inconsistent data ingestion with column data types, resulting in issues when merging tables.
- Duplication of column names (due to case sensitivity) during API data ingestion, causing errors when loading the data into the main table.
- Data inconsistencies are happening due to multiple joins in the transformation scripts.
The Solution
Following an in-depth analysis of their existing data warehouse to understand the overall complexity and existing system design, Onix implemented a solution to migrate historical data from SQL Server to the Microsoft Azure cloud platform. Additionally, the Onix team integrated with various source systems using Azure native tools, including APIs and files.
Overall numbers:
- Over 2TB of historical data
- 560 tables and 450 views
- 500 stored procedures
- 54 datasets
- 33 CUBES
- 129 SOAP and 65 Rest APIs
Some of the key services in Onix’s solution included:
- Azure Data Factory (ADF)
- ADLS Data storage
- Databricks
- Snowflake
Onix also deployed its Eagle, Pelican, and Raven accelerators to build (or convert) data transformation scripts on the Databricks platform. Code conversion using Raven and source system integration involved:
- Configuring the pipeline to integrate with multiple sources, including SOAP & REST APIs, files, and direct sources.
- Converting the current database objects (tables, views, and DDLs) to Databricks and MS Azure.
- Orchestrating and scheduling the converted workloads using Autosys and ADF.
Using the Raven tool, the client also addressed their data ingestion issues by:
- Using an XML Parser available as a Databricks library to efficiently parse data in either flattened or array format.
- Managing inconsistent behavior in Databricks’ Explode function using the PostExplode function.
- Creating Spark DataFrames to modify multi-line code and ingest it into the target table.
- Using pattern analyzer compilation services to ensure consistency across all column data types in ingestion runs.
- Using pattern analyzer compilation services to handle case-sensitive column names.
- Mitigating multiple joins issues by using the script analyzer to optimize join placements.
In partnership with Onix, the company also improved its overall data governance, security, and cross-platform interoperability by migrating its metadata from Apache Hive’s Metastore to Databricks’ Unity Catalog (UC). Additional changes to its codebase and repository included:
- Updating between 250–300 SQL scripts across 4 repositories to reflect the new namespace structure.
- Enhancing its custom framework to improve performance, consistency, and maintainability.
Outcome
- Successful reloading of historical data on the UC-enabled workspace to maintain continuity.
- On-time user acceptance testing and stabilized production ahead of schedule.
- Documentation of custom edge cases to support UC migrations in the future.
- Monthly cost savings of $10,000 for one business unit by implementing Vacuum jobs.
Conclusion
With Onix’s technological expertise and industry experience, this technology manufacturer successfully transitioned to the cloud and improved its data migration capabilities. Through this transition, the company could leverage its cloud-native services on the Microsoft Azure platform.