Topics In Demand
Notification
New

No notification found.

Automatic Machine Learning Frameworks of the Next Generation
Automatic Machine Learning Frameworks of the Next Generation

May 19, 2021

122

0

Automated Machine Learning (AutoML) is a process of building a complete Machine Learning pipeline automatically, without (or with minimal) human help.

The main goal of the AutoML framework was to find the best possible ML pipeline under the selected time budget. For its purpose, AutoML frameworks were training many different ML algorithms and tune their hyper-parameters. The improvements in the performance can be obtained by increasing the number of algorithms and checked hyper-parameters settings, which means longer computation time.

The AutoML users

 The goal of AutoML analysis depends on the user. Based on the authors’ observations, there are several types of AutoML users:

  • Software engineers well understand the Machine Learning theory but have a lack of experience in building ML pipelines. They are looking for AutoML with a simple API that can be easily integrated into their system. They need the best ML pipeline but under the constrain for prediction time on a single sample. They don’t want to build a stacked ensemble with several layers of models that need a lot of time to compute prediction for a single sample. They are looking for the ML model that computes prediction in milliseconds to be easily used in the REST API service.
  • Researchers and citizen data scientists are in the second group of AutoML users. They well understand the Machine Learning theory but are not proficient in coding. They are looking for a low-code solution that can be used to check many different algorithm combinations easily. They would like to understand as much as possible about their data. For example, which features are the most important, or are there any new feature combinations with predictive power (golden features).
  • The next group of users is seasoned data scientists. They are using AutoML for initial data understanding and quick prototyping. They would like to solve business needs quickly. They are looking for unique insights about data that can be used to improve company operations - the ML explainability is very important to them. They often need a simple Machine Learning model that can be easily interpreted and explained to other stakeholders (for example single Decision Tree with extracted text rules).

 

The next-generation of AutoML

 The AutoML needs to be multi-purpose to fill most of the users’ needs. That’s why AutoML frameworks often introduce the switchable mode of work.

  • Explain is perfect for initial data understanding. It splits data into 80/20 train and test sets. It trains algorithms like Baseline, Decision Tree, Linear, Random Forest, Xgboost, Neural Network, and Ensemble. It doesn’t perform hyperparameters tuning. It uses default hyperparameters values. In this mode, full explanations are computed for models: decision tree visualization, decision tree rules extraction, feature importance, SHAP dependency plots, SHAP decision plots.
  • Perform mode for training production-ready ML pipelines. It uses 5-fold cross-validation and tune algorithms like Random Forest, Xgboost, LightGBM, CatBoost, Neural Network, and Ensemble. It searches for the ML pipeline with prediction time for a single sample under 500 milliseconds (which can be set as an argument).
  • Compete mode for searching the best performing ML models under selected time budget. It is using an adjusted validation strategy that depends on dataset size. It highly tunes algorithms. Additionally, it applies advanced feature engineering techniques, like golden features, feature selection, k-means features.
  • Optuna mode for tuning ML models without a hard time limit for AutoML training. It is using the Optuna framework for hyperparameters tuning. It provides well-performing models, but the computational time can be large.

 

Sources & References:

          MLJar Github, MLJar

          Auto-Weka Thornton, et al., Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms, 2013.

          Auto-Sklearn Feurer, et al., Efficient and Robust Automated Machine Learning, 2015.

         TPOT S. Olson, et al., Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Proceedings of GECCO 2016

          AutoGluon Erickson, et al., AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data, 2020


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


images
Nishant Kumar
Technology Enthusiast

© Copyright nasscom. All Rights Reserved.