Topics In Demand
Notification
New

No notification found.

Data Warehouse & Machine Learning: Friendship turned into Relationship
Data Warehouse & Machine Learning: Friendship turned into Relationship

September 2, 2021

1321

0

Data Analytics is now indispensable part to management decision making. Over a period of time business requirements have grown. Also evolved the solutions that meet those requirements.

Scenario:

Conventionally, business stakeholders had queries related to their past business performance e.g how many new customers have been added, how many existing customers renewed the subscriptions, how various territories are performing, which product is leading in which geographical territory and which customer category etc.

Traditional analytics system with standard SQL capabilities could serve this purpose right. SQL skillset is widely available and there is no learning curve to build such system. IT teams ingested data from various transactional systems into data warehouse to implement central analytics platform.

Addition-1: To take right decisions for future business operations, stakeholders needed to join past data with near future projection e.g., what is projection of sales in a particular territory and what are current inventory levels in that territory, considering seasonal aspects and other industry factors, what could be the projection of new customers addition and existing customer.

These scenarios require machine learning on historical data. Teams need skills in python, machine learning, big data and ETL. For implementation, IT teams extract the data either from transactional systems or data warehouse to central storage/servers. ML teams train models on data. Development teams work on deployment of models and integration of deployed models with business applications to support ML enabled operations.

Each step in this pipeline takes time. If project duration is in months (say 4 months), when model is ready for use it already has lag because it was trained on 4-month-old data. Automatic refresh requires effort on pipeline automation. Involvement of multiple teams adds to complexity.  Multiple teams and un-managed data movement between systems raise questions on Data Security.

While businesses are grappling with ML projects’ implementation and cost challenges below comes another addition.

 

Addition-2: With growing complexity of business, there are domain and industry specific expectations that improve business operations. In education, student engagement is improved if specific content can be recommended that improves students’ performance. In healthcare, based on medical history if chances of patient being infected can be predicted then timely care and medication can be provided. In transportation, prior knowledge of potential number of travelers on one route can help plan the capacity and schedule of fleet effectively.

This might put more pressure on ML teams in organization.

Solution: Native Integration of Data Warehouse and ML Systems

Cloud service providers have been illustrious in providing fully managed services including Data Warehouse and Machine Learning. To ease out ML adoption in day-to-day business operations, to answer queries related to data security, cloud services have increased the integration and allow train and use ML models from within warehouse.

For example, Amazon Web Services (AWS) has data warehouse service Amazon Redshift and machine learning service Amazon SageMaker. These services are very well integrated to make it easy for data analysts and database developers to create, train, and apply machine learning models using familiar SQL commands in Redshift. This integration solves the challenges.

Teams do not need any ML experience as the models can be trained using standard SQL commands. As a machine learning beginner general knowledge of different aspects of machine learning such as preprocessors, algorithms and hyperparameters is sufficient to start building. Machine learning experts can take full control of training and hyperparameter tuning and SQL engine doesn't attempt to discover the optimal preprocessors, algorithms and hyperparameters because ML experts make all the choices.

 

Data security is enhanced because data availability between Redshift and SageMaker is managed by Redshift ML.

Model building timeline is reduced from months to days as Amazon Redshift ML leverages Amazon SageMaker Autopilot for automatically. This improves model relevance in production as data used for model is not months old.

There is no additional deployment cost because Amazon Redshift makes the prediction function available as a SQL function in Amazon Redshift cluster and uses existing cluster resources for prediction. Cost and time for training can also be controlled and reduced by specifying how many data points to be used for training.

The models built using this automated approach are available in SageMaker as well. SageMaker Deployment features can be used to deploy these models on SageMaker Deployment instances, and APIs can be made available for integration with business applications. The models can also be deployed on Intel processors-based instances (C5 series) and Intel Math Kernel Libraries can be leveraged to give performance boost.

Reference Links:

  1. Overview Guide
  2. How To

 

Author:

asd

Sachin Punyani, Business Development Lead, Artificial Intelligence, Machine Learning, Analytics, Internet of Things, Amazon Internet Services Pvt. Ltd.

"The above piece is not an editorial and carries the views and opinions of the author while he was a speaker at Cloud summit event, summarizing his talk."


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


AISPL

© Copyright nasscom. All Rights Reserved.