Top 5 Reasons to Move Enterprise Data Science Off the Laptop and to the Cloud

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

Top 5 Reasons to Move Enterprise Data Science Off the Laptop and to the Cloud

QuboleTechnologies

@QuboleTechnologies

March 2, 2020

Big Data Analytics

1122

We live in a world that is inundated with data. Data science and machine learning (ML) techniques have come to the rescue in helping enterprises analyze and make sense of these large volumes of data. Enterprises have hired data scientists — people who apply scientific methods to data to build mathematical software models — to generate insights or predictions that enable data-driven business decisions. Typically, data scientists are experts in statistical analysis and mathematical modeling who are proficient in programming languages such as R or Python.

Barring a few large enterprises, most data science is still being carried out on laptops, leading to a very inefficient process that is prone to errors and delays. In this blog, we will explore the top 5 reasons why we think ‘laptop data science’ within enterprises is dead in the age of cloud computing.

1. Enterprise Data Science Is a Team Sport

The algorithms and machine learning models form one piece of the advanced analytics and machine learning puzzle for enterprises. Data scientists, data engineers, ML engineers, data analysts, and citizen data scientists need to collaborate to deliver machine learning–based insights for business decisions.

In a scenario where data scientists are building models on their laptops, they are downloading datasets created by data engineers on to their laptop or on-premises server to build and train machine learning models. Due to the computing and memory limitations of laptops or on-premises servers, data scientists would have to sample the dataset to create smaller datasets. While these smaller sample sets help the data science project get off the ground, they create a lot of issues further into the data science life cycle.

There are also concerns about the staleness of data. With local copies of the data, there is a risk that data scientists could be building predictions based on an inaccurate snapshot of the real world. The use of larger, more representative samples sets from a centralized source location would alleviate this concern.

2. Big Data Beats Smart Algorithms

The recent surge of interest in artificial intelligence and machine learning is driven by the ability to quickly process and iterate (train and tune the ML model) over large volumes of structured, unstructured, and semi-structured data. In almost all cases machine learning benefits from being trained on larger, more representative sample sets.

Enterprises can unlock really powerful use cases by combining semi-structured interaction data (website interaction logs, event data) and unstructured data (email text, online review text) with structured transaction data (Enterprise Resource Planning, Customer Relationship Management, Order Management Systems, etc.). The key to unlocking business value from machine learning is the availability of large data sets that combine transactional and interaction data. With the increasing scale of data, these data are often processed on the cloud or in large on-premises clusters. Adding a laptop to this mix creates a bottleneck in the entire flow and leads to delays.

3. Focus on Data Science Rather Than Managing Infrastructure

Today, data scientists can leverage a lot of open source machine learning frameworks such as such as R, Sci-kit Learn, Spark ML, TensorFlow, MXNet, and CNTK. However, managing the infrastructure, configuration, and environments for these frameworks is very cumbersome when done on a laptop or on-premises server. This additional overhead of managing infrastructure takes time away from core data science activities.

However, much of the infrastructure management overhead goes away in the software-as-a-service model of the cloud. The usage-based pricing model in the cloud works well for machine learning workloads that are bursty in nature. The cloud also makes it easier to experiment among different ML frameworks with cloud vendors offering model hosting and deployment options. In addition, cloud service providers such as Amazon Web Services, Microsoft Azure, and Google Cloud offer intelligent capabilities as services. Thereby lowering barriers to integrating these capabilities into new products or applications.

4. Data Accuracy and Model Auditability

The predictions from a machine learning model are only as accurate and representative as the data used to train them. Every modern manifestation of AI/ML is made possible by the availability of high-quality data. For instance, apps that provide turn-by-turn directions have been around for decades are now much better than they were in the past thanks to the larger volume of data.

It is no surprise then that a significant part of AI/ML operations revolves around data logistics, which is the collecting, labeling, categorizing, and managing of data sets that reflect the real world we are trying to model with machine learning. For an enterprise with several data users, this problem gets further complicated when multiple local copies of the data set exist among the various data users.

The concerns around security and privacy are increasingly taking center stage, and enterprise data processes need to be in compliance with data privacy and security regulations. A centralized repository for all data sets not only simplifies management and governance of data but also ensures data consistency and model auditability.

5. Delayed Time to Value

All of the above reasons contribute to a delayed time to value with laptop data science. In a typical workflow for a data scientist working off their laptop, they would first need to sample the data and download datasets manually onto their laptops or connect via ODBC driver to a database. Secondly, they would need to install and maintain all of the required software tools and packages such as RStudio, Jupyter, Conda distributions, machine learning libraries, language versions such as R, Python, and Java.

When the model is ready to be deployed to production, they would hand it off to an ML engineer. The ML engineer needs to either convert the code to a production language such as Java/Scala/C++ or at least optimize the code and integrate with the rest of the application. Code optimization would consist of: (1) rewriting any data query into an ETL job, (2) profiling the code to find any bottlenecks, and (3) adding logging, fault-tolerance, and other production-level capabilities.

Each of these steps presents bottlenecks that can result in delays. For instance, inconsistencies in software or package versions between development and production environments can result in delays. Code built in a Windows or Mac environment will certainly break when deployed into Linux.

Conclusion

All of the above issues with running data science on laptops result in loss of business value. Data science involves resource-intensive tasks in data preparation, model building, and model validation. Data scientists will typically iterate several hundreds of times between features, algorithms, and model specifications before they find the right model for the business problem they are trying to address. These iterations can take a significant amount of time. Adding bottlenecks around infrastructure and environment management, deployment, and collaboration can further delay time-to-value for enterprises.

Data scientists who rely on laptops or local servers are making a trade-off between ease of getting started with the ease of scaling and productionizing ML models.

While working on a laptop or a local server gets the data science team up and running faster, cloud platforms provide greater long-term advantages such as unlimited compute and storage, easier collaboration, faster time to ML in production, and many more.

The fastest and most cost-effective way to get started with data science and machine learning on the cloud is to use a cloud-native data science and machine learning platform such as Qubole. Sign up to test drive Qubole for free today.

References:
https://cloudacademy.com/blog/what-are-the-benefits-of-machine-learning-in-the-cloud/
https://blog.dominodatalab.com/cost-data-science-laptops/
https://www.cio.com/article/3254693/artificial-intelligence/ais-biggest-risk-factor-data-gone-wrong.html

technology Bigdata #artificialintelligence datascience MachineLearning CloudData

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

QuboleTechnologies

Look to Cross-Tab and Tabular Analytical BI Tools for Clear Results

kartikpatel

@KartikPatel

28 Aug 2025

Analytics Big Data Analytics

Cross-Tab and Tabular Reporting Tools Are Foundational! The evolution of advanced analytics has been rapid and impressive. With features that provide support for business users and help to transition them into Citizen Data Scientists, and the…

How Master Data is Foundational to Business Transformation?

CSM Tech

@csmtechnologies

13 Aug 2025

Big Data Analytics

Digital transformation has evolved rapidly over the years, becoming a critical driver of business innovation and growth. What started as a slow shift towards technology adoption has now become an essential strategy for businesses looking to have…

Developing Intelligent Chatbots with Generative AI Capabilities

Motherson Tec..

@Jaydip Roy

11 Aug 2025

AI Inside AI Big Data Analytics

Developing Intelligent Chatbots with Generative AI Capabilities “Intelligent chatbot development is advancing through generative AI applications, integrating NLP chatbot solutions and conversational AI tools. This…

From Global Talent to Global Impact: How Remote Staff Augmentation Unlocks 24/7 Expertise

C5i (Course5 ..

@Ronald Fernandes

06 Aug 2025

Analytics

Research AI Markets don’t sleep anymore, and neither can your operations. As research timelines shrink and clients expect answers in real time, traditional team setups just can’t keep pace. Many leaders still depend on local teams to…

How To Simplify Insurance Claims Processes with Data Analytics?

Ken Milko

@kenmilko

05 Aug 2025

Big Data Analytics

In our last blog, we discussed the important factors to bear in mind before transforming insurance claims operations. In this post, we will uncover how data analytics can streamline insurance claims workflows. A digitized Insurance claims…

Worker Lives Matter: The Tech Revolution Transforming Workplace Safety

TATA Communic..

@tatacommunications

30 Jul 2025

Manufacturing Retail - FMCG CPG

In an era defined by rapid technological advancement and global interconnectedness, one would expect workplace safety to be a universally upheld standard. Yet, the grim reality is that millions of workers worldwide continue to face life-threatening…

Topics In Demand

Notification

New

Top 5 Reasons to Move Enterprise Data Science Off the Laptop and to the Cloud

1. Enterprise Data Science Is a Team Sport

2. Big Data Beats Smart Algorithms

3. Focus on Data Science Rather Than Managing Infrastructure

4. Data Accuracy and Model Auditability

5. Delayed Time to Value

Conclusion

Share this blog

Related blogs