Why do Organizations need HeatWave?
Many organizations struggle to make their digital business models viable at the current rate of digitization, due to the high cost and complexity of modern data stacks. One of the main challenges in managing existing data and analytics solutions is identifying suitable databases for online transaction processing (OLTP), online analytical processing (OLAP), and machine learning (ML) volumes and transporting extracted data between them also maintaining multiple databases. MySQL HeatWave combines transactional analytic and ML services into a single and fully managed service with the additional power of in-memory query accelerator. This results in a very performant solution for SQL analytics at a fraction of the cost compared to other industry solutions.
What Makes the Difference?
These are some of the architectural features that lead to cost advantage with high performance for HeatWave:
Massive Data Partition
HeatWave architecture supports to have each node within a cluster and each core within a node can process partitioned data in parallel, including parallel scans, joins, group-by, aggregation and top-k processing. This enables high cache hits for analytic operations and provides very good inter-node scalability. The compressed and optimized in memory representation is used for both numeric and string data results in significant performance speed up and reduced memory footprint which translates to reduced cost for customer. The use of columnar in-memory representation facilitates vectorised processing, leading to very good query performance.
Single Data Management
HeatWave is designed as a MySQL pluggable storage engine, which helps the user to manage MySQL and HeatWave using same tools. For the fast data reload operations like error recovery and maintenance, the scale out data management layer helps with data in OCI object storage or Amazon S3. The query optimizer transparently decides whether all operators and functions referenced in the submitted query are supported by HeatWave and if the estimated time to process the query with the HeatWave engine is less. The HeatWave cluster can be scaled up or down to any number of nodes. During the scaling operation, the cluster continues to work, there is no interruption in client connections for read or write and there is minimal effect on query performance in HeatWave.
Real-Time Elasticity
The HeatWave cluster can be adjusted in size by either scaling up or scaling down. Scaling up involves loading data to newly created nodes and updating metadata and queries. Scaling down requires transferring additional data to the retained nodes, updating metadata and queries, and processing with the reduced cluster size
In-memory analytics engine
The Heatwave engine uses a columnar in-memory representation which leads to very good query performance . The data is encoded and compressed prior to being loaded in memory. This compressed and optimized in memory representation is used for both numeric and string data.
Autopilot
Autopilot makes the HeatWave query optimizer increasingly intelligent by using advanced techniques to sample data, collect statistics on data and queries, and build machine learning models to model memory usage, network load and execution time. It focuses on system setup, data load, query execution and failure handling aspects. There are four main areas where the Autopilot focuses on. The System Setup includes Auto provisioning, Auto Shape Predictions and Auto Provisioning predicts by using sampling of table data how many nodes are required for running a workload.
Auto shape prediction allows customers to get best price and performance by analysing the OLTP workload, throughput and buffer pool hit rate for recommending the right compute shape at any given time.
Data Load is enriched with 4 Auto features
• Auto parallel loading optimizes the load time and memory usage by predicting the optimal degree of parallelism for each table being loaded into HeatWave.
• Auto encoding determines the optimal representation of columns being loaded into HeatWave taking queries into consideration. This optimal representation provides the best query performance and minimizes the size of the cluster to minimize the cost.
• Auto data placement predicts the column on which tables should be partitioned in-memory to achieve the best performance for queries. It also predicts the expected gain in query performance with the new column recommendation.
• Auto unload determines unused or rarely used tables in HeatWave, and predicts
Query Execution which includes
• Auto query plan improvement learns various statistics from the execution of queries and improves the execution plan of future queries. This improves the performance of the system as more queries are run.
• Auto query time estimation estimates the execution time of a query prior to executing the query, allowing quick tryout and test on different queries. Auto change is propagated to the HeatWave storage layer and changes are propagated to the HeatWave storage layer. This ensures that changes are being propagated at the right optimal cadence.
• Auto change propagation intelligently determines the optimal time when changes in MySQL Database should be propagated to the HeatWave storage layer. This ensures that changes are being propagated at the right optimal cadence
• Auto scheduling determines which queries in the queue are short running and prioritizes them over long running queries in an intelligent way to reduce overall wait time.
• Auto thread pooling lets the database service process more transactions for a given hardware configuration, delivering higher throughput for OLTP workloads and preventing it from dropping at high levels of transactions and concurrency
Failure Handling
In the form of Auto error recovery which means HeatWave storage layer reloads necessary data if one or more HeatWave nodes is unresponsive.
AutoML
The current approach to use machine learning in MySQL requires the user to perform ETL (Extract, Transform, Load) on the database table. The data must be extracted from the database and the user must learn and use third-party tools and libraries to train a model and then perform inference and explanations. In addition to being onerous and time consuming, this process also has the potential to proliferate data outside of the database, causing data security and governance issues. HeatWave AutoML enables MySQL users to train a model, generate inferences and explanations, without extracting data out of the MySQL database.
Fully Automated : HeatWave AutoML fully automates the creation of tuned models, generating inferences and explanations, thus eliminating the need for the user to be an expert ML developer
SQL interface : Provides the familiar MySQL interface for invoking machine learning capabilities
Security and Efficiency : Data and models never leave the MySQL Database. Clients or any other services never see the data or models stored in the DB service
Explanations : All models created by HeatWave AutoML can be explained. Enterprises have a growing need to explain the predictions of machine learning models to build trust, demonstrate fairness, and comply with regulatory requirements
Performance and Scalability : The performance of HeatWave AutoML is much better at a lower cost than competing services such as Redshift ML. Furthermore, HeatWave AutoML scales with the size of the cluster.
Easy Upgrades : HeatWave AutoML leverages state-of-the-art opensource Python ML packages that enable continual and swift uptake of newer (and improved) versions
CONCLUSION
MySQL HeatWave is a fully managed database service that offers organizations a high-performance and cost-effective solution for transactional analytics, machine learning, and in-memory processing. HeatWave combines the best of MySQL with the latest in in-memory technology to deliver a database that is in orders of magnitude faster than traditional MySQL databases. HeatWave's unique architectural features, such as massive data partition, single data management, real-time elasticity, in-memory analytics engine, and Autopilot, make it stand out as a performant solution for SQL analytics at a fraction of the cost compared to other industry solutions. By combining transactional analytics and machine learning capabilities into a single service, HeatWave simplifies data management and reduces complexity for organizations struggling to make their digital business models viable in today's rapidly digitizing world.
ABOUT THE AUTHOR
Sajikumar, Technical Architect, NeST Digital