Topics In Demand
Notification
New

No notification found.

Federated Learning :  A privacy first approach to ML
Federated Learning : A privacy first approach to ML

1635

1

What is Federated Learning?

Federated Learning is a form of collaborative learning introduced by Google in 2016 to address privacy concerns regarding sharing of data. In a traditional machine learning scenario, the training data, which is stored across multiple data sources / devices,  is expected to be hosted at a centralized location to facilitate model training. For making training data available at one central location, the data across multiple sources must be transferred, raising concerns regarding data corruption, privacy, trust, etc. Imagine being able to train a ML model where the data never leaves the source of origin. Instead, multiple models are trained locally at the data sources / devices and the learnings(model parameters) are centralized and aggregated, resulting in a collaboratively trained global model, which is superior to the individual models. This is exactly what Federated Learning does.

Federated Learning sits in stark contrast to traditional ML approaches in the way the model is trained. Federated Learning relies on aggregating model updates received from training multiple local models on different devices. These local models are trained on datasets local to the respective devices. The training involves a cohort of edge-devices (Federated Learning clients) like the client’s own desktop or phone, that participate in the model training and a central server (Federated Learning server) responsible for aggregation. The steps are as follows:

  1. The Data Science team at the Federated Learning server side chooses a model to be trained for a given task.
  2. The server sends a copy of the model to the Federated Learning clients for training
  3. The personal copy of the model is trained locally by the clients with their own datasets, achieving local convergence.
  4. Once the training is complete, the clients send their model updates to the server, where the server aggregates the individual local model updates to create a global model. Step 2 is repeated till meaningful convergence is achieved or for a preset number of iterations.

Source - Wikimedia

 

Ok. But why do we need to aggregate the learnings? Aren’t these local datasets enough for effective training to happen on its own? The answer is “it depends”. It is ideal to have a local dataset which is large. However in most of the scenarios, the quantity of data is not sufficient in these individual sources and might be quite sparse leading to mal-converged models. Consider you are training a Language Model, for predicting the next word when you type. If the model was trained only on your own data, the predictions will be limited to your vocabulary only, without any scope of better alternatives being suggested. To address these shortcomings, it makes sense to inject some essence from other local models, which are trained on a variety of writing styles and diverse vocabularies, while still retaining your own personalized style of writing and choice of words. That’s where aggregating models learned on different datasets comes into the picture, to provide generalization resulting in a model that is better than the sum of its parts. Further, personalization can be imparted to the global model outputs by re-applying the updates from the local model.

In a nutshell, Federated Learning enables training ML models without data leaving the source of origin, thus eliminating the need for centralizing data and enforcing data privacy by design.

Why is Federated Learning Necessary ?

  1. Data Inaccessibility - In many cases it might be difficult to centralize the data being generated due to technical infeasibility and economical viability, which might lead us to discard data after short intervals without deriving any values out of it.
  2. Data Privacy -  Access to data might also be limited owing to regulations like GDPR, CCPA and other legal compliance, which limits both the kind and amount of data that can be stored. Also data breaches in the recent past clearly points to the inherent danger of centralizing data in a single location owing to the volume of the data contained.

 

Federated Learning is central to addressing these challenges by eliminating the need to store data at a central location. This enables users to indirectly collaborate with their peers to train models that safeguard privacy. Organizations can also collaborate with their peers and vendors, without the need to share the data to build more robust AI models that could potentially solve fundamental yet challenging problems in Healthcare and BFSI sectors. 

Several other Privacy Preserving techniques such as Differential Privacy, Homomorphic Encryption, Secure Multi-Party Computation in addition to Federated Learning promise stricter privacy guarantees, making it possible to comply with the ever-changing landscape of Data Sharing and Privacy guidelines as well as engendering trust in the robustness of the privacy first  approaches.

Federated learning in the Tech Industry - 

  • Google uses Federated Learning in their mobile keyboard product, called GBoard, to train Language Models for predicting the next word you are going to type, improving query suggestions based on what you type. Google is also experimenting with Federated Learning to eliminate the use of third-party cookies in their browser Chrome, making it immensely hard for advertisers to track user activities on the web to serve targeted ads.
  • The virtual assistant for iOS - Siri wakes up when you say “Hey Siri,” but not when the same phrase comes from your friends or family. Apple employs Federated Learning to enable this personalization based on your voice patterns.
  • Nvidia’s CLARA makes use of Federated Learning in the healthcare sector where Health Care Organizations can collaborate to build better diagnostic models without sharing patient data. This is important given the critical nature of the task and the utmost need to preserve the privacy of the patients, conforming to HIPAA regulations.
  • In the BFSI sector, WeBank utilizes FATE, a home-grown Federated Learning framework to facilitate collaboration across other banks and financial institutions to train better models for Credit Risk Management and Anti-Money laundering, ensuring richer learning without sharing any data that might give away their own competitive advantages.

 

Authors: 

                                                               

Ankita Sinha                    Arkadeep Banerjee               Goutham Kallepalli

Software Engineer, Intuit       Data Scientist, Intuit                Software Engineer, Intuit

 

 


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


© Copyright nasscom. All Rights Reserved.