Topics In Demand
Notification
New

No notification found.

To DeepLearn or not to DeepLearn
To DeepLearn or not to DeepLearn

August 23, 2022

421

0

To DeepLearn or not to DeepLearn

Authors: Vignesh T Subrahmaniam, PhD. and Sidharth Kumar, PhD.  Intuit AI

The match between Graphical Processing Units (GPU’s) and algorithmic auto-differentiation of mathematical functions was certainly made in heaven. It has enabled one of the most celebrated developments in Machine Learning – Deep Learning (DL). DL has opened up exciting possibilities in Computer Vision and Natural Language Processing, where DL algorithms are approaching human cognitive ability in processing visual information, audio, as well as textual data. The success of DL algorithms in a set of difficult albeit limited number of machine learning problems has created a media frenzy with DL being portrayed as a panacea for all Machine Learning problems.

The extraordinary attention on DL has increased demand for parallelized cloud computing infrastructure in the form of GPUs and TPUs that can quickly guzzle millions of dollars and massive amounts of electrical energy to deep learn algorithms. To complicate matters, in many situations a deep learning algorithm may significantly underperform in blinded tests when compared to a computationally parsimonious algorithm such as a boosted tree or a support vector machine. In addition, deep learning frameworks being native to GPU implementations, whereas other algorithms being predominantly CPU implementations brings in a bias toward using the DL frameworks more widely since they are intrinsically more easy to scale in practice. A true measure of their relative usefulness could be ascertained if there were freely available and robust GPU implementations of the traditional ML algorithm varieties as well.

Easily accessible cloud computing infrastructure means Deep Learning is now no longer a novel technology accessible to a few elite machine learning scientists but to everyone. Most data scientists and machine learning engineers expect access to highly parallelized cloud computing infrastructure for training a deep learning algorithm as part of their routine work detail. While democratizing access to computing infrastructure for deep learning is commendable, when used recklessly it can lead to unwarranted increases in costs with potentially degraded algorithmic accuracy. This coupled with high expectations may lead to a second AI winter! In this blog, we will briefly explore some of the nuances of why Deep Learning algorithms work for certain applications and don’t for others.

Machine Learning algorithms work by determining optimizable parameters for a function that can map a set of inputs to a desired output. The process of optimizing the weights iteratively such that an average error metric is minimized on a representative dataset is called learning or training. In deep learning, the optimizable function is constructed as a nested sequence of layers, with the outputs of a preceding layer in the sequence being the inputs  for the succeeding layer in the sequence. When this sequence is long (say three or more) it is typically called a deep network. Each layer in the sequence can have thousands or even millions of optimizable parameters.

The quality and robustness of any learning algorithm, including deep learning, depends primarily on three considerations – 1) The mathematical soundness of the optimization procedure that determines the learnable parameters, 2) the representativeness of the training dataset used for learning and 3) the statistical design of the error metric that is minimized by the optimization algorithm. Whenever there is a deficiency in any of these considerations, the machine learning algorithm would have been incorrectly trained with disastrous results on a blinded test.

Bias Variance Trade-Off

The question of when to deep learn and when not to deep learn is a manifestation of a  fundamental problem in statistical learning – the bias variance trade off. Simply put, if an algorithm learns every nuance and variation in a learning dataset it will have very little error when tested on a similar looking dataset. However, the algorithm will be at risk of learning something completely different and making incorrect predictions when presented with a data-set that looks very different for the same problem or use case. DL models by virtue of having millions and sometimes billions of parameters can learn every nuance and variation in the training data - almost mimicking a child learning through rote memorization.

This learning strategy can work provided the datasets on which the DL models are trained have little or no variations across time and applications. In the case of natural language processing where a DL model, for example, would be trained on the text of all Shakespearean plays, the text of the Bard’s work and associated interpretations are not expected to vary with time. In this situation a Deep learning model would be appropriate.  Contrast this with a deep learning model that has to be trained on second by second stock market ticks, where the statistical properties of the underlying data can change in a matter of hours. Any patterns learnt by a deep learning algorithm on such a dataset is likely to be ephemeral and in fact be detrimental for anyone using predictions from such a model. For such use cases lower complexity learning algorithms such as regression trees and support vector machines would be more appropriate, where there is much lesser risk of the algorithm learning ephemeral patterns from noise that is inherent in these situations.

Another important consideration is, how many combinations of layers and parameters are required to constitute a mathematical approximation to a variable being fit? For example, genetic programming can come up with faster and more robust solutions to mathematically complex functions, as compared to deep learning which may require a large number of layers, parameters, and possibly an iteratively modified architecture to approximate the same function. In addition, the way GPs are trained, it’s possible to explain the final solution in terms of the constituent independent variables in a much simpler manner as compared to an NN. This is particularly useful in fields like finance, medicine and aviation where tolerance for error is very low and explainability is tantamount. Note that this is not to say that we cannot control for overfitting, or that NNs do not constitute explainable AI ; however, they do not stem organically from their use.

In conclusion,  for use cases where there is inherent randomness in the training data, the statistical nature of the randomness is unknown, a smaller amount of training data is available, and explainability is important, a computationally simpler algorithm would be preferred. On the other hand, for use cases where the statistical properties of the underlying learning datasets are not expected to vary with time, and there is a lot of training data available, and lack of explainability may not carry adverse consequences, deep learning models would be appropriate. These simple criteria should help us begin to answer the question of whether: “To DeepLearn or not to DeepLearn”.


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


© Copyright nasscom. All Rights Reserved.