Introduction
Before hitting the data augmentation and its techniques, we can say it comes under the domain of deep learning. Deep learning is a subfield of machine learning. While both deep learning and machine learning fall under the broad category of artificial intelligence. And Deep learning is what provides the most human-like artificial intelligence. Thus deep learning plays a key role in improvisation of artificial intelligence. The prediction accuracy of the Deep Learning models is largely reliant on the amount and the diversity of data available during training the model. In the real world scenario, we may have a dataset of images taken in a limited set of conditions. But, our target application may exist in a variety of conditions, such as different orientation, location, scale, brightness etc.
There comes the significance of data augmentation which is a great technique that helps deep learning to have an extensive set of training data sets without actually increasing the data. Data augmentation will take responsibility for these situations by training our neural network with additional synthetically modified data. It slightly modifies the data set so that it can be taken as a new bunch of model sets which can be used to train the deep learning model.
What is Data Augmentation
When you train a machine learning model, you’re really turning its parameters such that it can map a particular input to some label as output. Here the optimization goal is to find that sweet spot where our model’s loss is low, which happens when your parameters are tuned in the right way. Naturally, if you have a lot of parameters, you will have to show your machine learning model a proportional amount of examples to train the model. So that we get good performance. Also, the number of parameters you need is proportional to the complexity of the task your model has to perform.
Data augmentation is a technique that enables the users to significantly increase the diversity of their available data, without actually collecting any new set of data. So now the question is how to increase the data set size and diversity. A convolutional neural network (CNN) that can robustly classify objects even if it is placed in different orientations is said to have the property called invariance. More specifically, a CNN can be invariant to translation, viewpoint, size, brightness or a combination of the above.
Data Augmentation techniques
For the data augmentation techniques, we specify a factor that increases the size of our dataset which is called Data augmentation factor. Following are the basic data augmentation techniques that we commonly use :
- Cropping
- Padding
- Flipping
- Rotating
- Combining
- Gaussian Noise
- Cropping : We can randomly crop the image without the main object fully or partially visible. Which means we randomly take samples as a section out of the original image. This method is known as random cropping. It can be either sized to the original image or leave the cropped size of the image.
Figure.1: Random cropping and the cropped sections were resized to the original image size.
- Padding: It is somewhat similar to the cropping but the size will remain the same and in which we can pad our image with the main object fully or partially covered or we can say we move the main object along the X,Y or both the direction. This can be considered as a translation of images as well. This method of augmentation is very useful as most objects can be located at almost anywhere in the image. This forces your convolutional neural network to look everywhere.
Figure 2. Padding technique
- Flipping: We use this technique to flip the image in different directions. Vertical flip, Horizontal flip and Vertical and Horizontal flip.
Figure 3. Flipping technique
- Rotating: We can randomly rotate the image to a certain degree clockwise or counter clockwise. And one key thing to note about this operation is that image dimensions may not be preserved after rotation. Rotating the image by finer angles will also change the final image size.
Figure 4. Rotating technique
- Combining: In combining we can join two different images horizontally or vertically.
Figure 5. Combining Technique
- Gaussian Noise: Over-fitting usually happens when your neural network tries to learn high frequency features (patterns that occur a lot) that may not be useful. Gaussian noise, which has zero mean, essentially has data points in all frequencies, effectively distorting the high frequency features. This also means that lower frequency components (usually, your intended data) are also distorted, but your neural network can learn to look past that. Adding just the right amount of noise can enhance the learning capability.
A toned down version of this is the salt and pepper noise, which presents itself as random black and white pixels spread through the image. This is similar to the effect produced by adding Gaussian noise to an image, but may have a lower information distortion level.
Figure 6. Gaussian Noise
Data Augmentation in Deep Learning
Deep Learning models have made incredible progress in discriminative tasks. This has been fueled by the advancement of deep network architectures, powerful computation, and access to big data. Having a large dataset is crucial for the performance of the deep learning model. Thus with data augmentation we can improve the performance of the model with the data we already have. Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks. However, most approaches used in training neural networks only use basic types of augmentation. While neural network architectures have been investigated in depth, less focus has been put into discovering strong types of data augmentation and data augmentation policies that capture data invariances.
Deep convolutional neural networks (CNN) have performed remarkably well on many computer tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data. Thus data augmentation, is a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them.
Data Augmentation in PyTorch and MxNet
PyTorch and MxNet are the built-in packages that are commonly used for applying the data augmentation techniques to the data set.
Transforms in Pytorch : Transforms library is the augmentation part of the torchvision package that consists of popular datasets, model architectures, and common image transformations for Computer Vision tasks. Transforms library contains different image transformations that can be chained together using the Compose method. Functionally, Transforms has a variety of augmentation techniques implemented. You can combine them by using the compose method. Additionally, there is the torchvision.transforms.functional module. It has various functional transforms that give fine-grained control over the transformations. It might be really useful if you are building a more complex augmentation pipeline. Transforms don’t have a unique feature. It’s used mostly with PyTorch as it’s considered a built-in augmentation library. Transforms works only with PIL images that is why you should either read an image in PIL format or add the necessary transformation to your augmentation pipeline.
from torchvision import transforms as tr
from torchvision.transfroms import Compose
pipeline = Compose(
[tr.RandomRotation(degrees = 90),
tr.RandomRotation(degrees = 270)])
augmented_image = pipeline(img = img)
Sometimes you might want to write a custom Data loader for the training.
from torchvision import transforms
from torchvision.transforms import Compose as C
def aug(p=0.5):
return C([transforms.RandomHorizontalFlip()], p=p)
class Dataloader(object):
def __init__(self, train, csv, transform=None):
…
def __getitem__(self, index):
…
img = aug()(**{‘image’: img})[‘image’]
return img, target
def __len__(self):
return len(self.image_list)
Trainset=Dataloader(train=True,csv=’/path/to/file/’,
transform=aug)
Transforms in MxNet : Mxnet also has a built-in augmentation library called Transforms (mxnet.gluon.data.vision.transforms). General usage is as follows.
color_aug = transforms.RandomColorJitter(
brightness=0.5,
contrast=0.5,
saturation=0.5,
hue=0.5)
apply(example_image, color_aug)
Even though these packages give support for data augmentation, the real power of Data Augmentation comes out when you are using custom libraries. They have a wider set of transformation methods. They allow you to create custom augmentation. You can stack one transformation with another. That is why using custom data augmentation libraries might be more effective than using built-in ones.
Data Augmentation Libraries
So as we said before, to work out the full potential of data augmentation in deep learning we have to use custom libraries rather than depending only on the built-in ones.
- scikit-image: It is an open-source Python package that works with NumPy arrays. It is a fairly simple and straightforward library even for those who are new to Python’s ecosystem.
- OpenCV-Python: OpenCV essentially stands for Open Source Computer Vision Library. Although it is written in optimized C/C++, it has interfaces for Python and Java along with C++. OpenCV-Python is the python API for OpenCV. You can think of it as a python wrapper around the C++ implementation of OpenCV. OpenCV-Python is not only fast (since the background consists of code written in C/C++) but is also easy to code and deploy(due to the Python wrapper in the foreground). This makes it a great choice to perform computationally intensive programs.
- Imgaug : imgaug is a library for image augmentation in machine learning experiments. It supports a wide range of augmentation techniques, allows to easily combine these and to execute them in random order or on multiple CPU cores, has a simple yet powerful stochastic interface and can not only augment images but also keypoints landmarks, bounding boxes, heatmaps and segmentation maps.
The imgaug library provides a very useful feature called the Augmentation pipeline. Such a pipeline is a sequence of steps that can be applied in a fixed or random order. This also gives the flexibility to apply certain transformations to a few images and other transformations to other images.
- Keras Image Data Generator : The Keras library has a built-in class created just for the purpose of adding transformations to images.This class is called ImageDataGenerator and it generates batches of tensor image data with real-time data augmentations.\
-
- rotation_range is a value in degrees (0-180), a range within which to randomly rotate pictures
- shear_range is for randomly applying shearing transformations
- zoom_range is for randomly zooming inside pictures
- horizontal_flip is for randomly flipping half of the images horizontally –relevant when there are no assumptions of horizontal asymmetry (e.g. real-world pictures).
- fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.
Conclusion
Data augmentation is something that thrust in the deep learning computer vision tasks. As it has the ability to generate more data without actually creating new data is giving immense help for deep learning in the domains where we cannot access the big data. Like, the medical field. And it helps us to avoid overfitting: For a network it is somewhat problematic to memorize a larger amount of data, as it is very important to avoid overfitting. This occurs because the model memorizes the full dataset instead of only learning the main concepts underlying the problem. To summarize, if our model is overfitting, it will not know how to generalize and, therefore, will be less efficient. It is clear that image augmentation is simple to implement. It should be pointed out that you cannot use all the possible types of augmentation, which is why for better results we need to use the right kind of augmentation.