EVOLUTION OF DATA PIPELINES

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

EVOLUTION OF DATA PIPELINES

L&T Technology Services

@L&T Technology Services

March 16, 2022

Data Science & AI Community

6329

In the past, when data had to be updated, operators manually entered it into a data table. This would lead to manual user entry errors and time lag. Since this was majorly done in batches, mostly as a daily job, there was substantial lead time from the time the event occurred to the time it was reported. Decision makers had to live with this time lag and often make decisions on stale data.

Fast forward into the present and now we have real-time updates and insights which are common place requirements. Building data pipelines essentially was with the intent to move data from one layer (transactional or event sources) to data warehouses or lakes where insights where derived.

The question is with these advancements in requirements to support real-time insights, and other quality requirements, are we efficient by using traditional architectures or popularly used ETL approaches. Let’s find out!

Current state of Data Pipeline Architectures and Challenges

Data pipelines is important to any Product Digitization program. Later half of this decade we witnessed immense focus on Digital architecture and technologies being adopted. Adoption of microservices and containerization is only seeing a strong growth trajectory establishes this fact. We also see tech advancements being applied but limited to traditional “OLTP” or core service/business logic.

However, the story is bit different, when one inspects the patterns involved in Data pipelines or “OLAP” side of things. We observe limited adaptation to tech evolution seen in core services space. Most common data pipelines are built using either traditional ETL, or ELTL architectures. These are popular industry de-facto approaches. Though these do solve the larger problem at hand i.e. deriving actionable insights, but it also comes with certain limitations. Let’s explore some of these challenges:

Siloed Teams: The ETL process requires expertise or skills in data extraction or migration. This could mean the technical team is layered or structured to deal with technical nuances of the process. E.g.: An ETL engineer is many a times oblivious to insights being derived and how it is consumed by end users.

Limited Manifestation: The implementation team is now trying to fit any use-case that is desired in to the set structure or pattern. Though this is always not a problem or a wrong thing to do, there are times this can be more in-efficient. E.g.: How does one extract from an unstructured source and deal with modelling the intermediate persistence schema?

Latency: Time taken to process extract, transform and load the data many a times does introduce lags. This lag could be attributed to the fact that data is processed in batches, or the necessary intermediate load steps to persist interim results. In few business scenario, this is not acceptable. E.g.: Data streams emanating from an IoT service is stored and batch processed at a later scheduled time. Thereby, introducing a lag from data generation to updated insights on dashboards.

Future state of Data Pipeline Architecture and Key considerations

As we see advancements in general software architecture like Microservice, Service Mesh, and so on, there is a need for similar modernization. One key approach emerging is distributing the data pipeline for the domains instead of centralized data pipeline contributing to build multiple such objects resulting in Data Mesh. Data Mesh aims to address these challenges by adopting a different approach:

Team or pods that are aligned on functional feature delivery
Treat Data as Product (discoverable, self-contained and secure)
Polyglot storage and communication facilitate via Mesh

Initial read on Data Mesh can be found here.

Data Mesh can be implemented in various ways. One effective pattern is to use Event driven approach and Event storming to form Data Products. A Domain can comprise of one or more Data Products. This would also mean that data can be redundant and persisted in one or more stores. This is referred to as Polyglot storage. Finally, these data products are consumed via the Mesh APIs designed along the lines of each domain requirement.

Other architectural styles include Data Lake, Data Hub and Data Virtualization. A brief comparison on these can be found here.

Some other considerations that one should evaluate:

Facilitate easy data access any time use standard interfaces like SQL. Tech like Snowflake, DBT, Materialize enable such real-time joins which not only enables BI, but also helps in low level plumbing of the pipeline
Design Data Pipelines to be robust and fault tolerant, E.g. checkpoint intermediate results where required for further analysis
Leverage distributed loosely-couple processing units, scalable to use polyglot technologies e.g. Spark job or Python models
Use Data Virtualization to mitigate bottlenecks, E.g. shorten lead time for data availability
Use of DataOps effectively to track and evaluate your Data pipeline performance

Conclusion

Finally, I would like to conclude with a disclaimer. This article is not to discard current architectures associated to ETL. In fact, for certain use cases like batch jobs, ETL is still a very good option to adopt. The intent here is more of a realization one would need to have based on the varied requirements and explore further architectures which could suit well for the need. In this article, we looked at few such architectures like Data Mesh and associated areas one needs to consider.

DATA PIPELINES DATA WAREHOUSES Cloud data lakes DATA PIPELINE ARCHITECTURES data extraction DATA MESH

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

L&T Technology Services

ER&D

L&T Technology Services

Why Big Techs are Replacing Roles w...

Janhvi Juyal

Emerging Tech

21 Jul 2025

Digital Identity Management: Defend...

Jayajit Dash

Data Science &a..

18 Jul 2025

Types of Chatbots: Script-Based and...

Sparkout Tech

Data Science &a..

15 Jul 2025

AI-Driven Personalization in Wealth...

NuSummit

AI

15 Jul 2025

Building Client Loyalty with Data a...

NuSummit

Digital Transfo..

15 Jul 2025

How AI Can Improves Data Protection...

AlgoDocs

AI

14 Jul 2025

What makes agentic AI the future of...

Opcito Technologies

653

AI

11 Jul 2025

A Step-by-Step Guide to Building an...

Getlatest

Sales & Mar..

10 Jul 2025

Breaking Down Today’s Top Headlines...

Getlatest

Sales & Mar..

10 Jul 2025

The Latest Buzz in Tech, Culture, a...

Getlatest

Sales & Mar..

10 Jul 2025

Chaos, Cameras & Cold Tea: Why ...

Valiance Solutions

Data Science &a..

10 Jul 2025

Agentforce 2dx: Enhancing Enterpris...

Daniel Walker

Mulesoft and Sa..

09 Jul 2025

Why Big Techs are Replacing Roles with AI and How You Can Stay Relevant?

Janhvi Juyal

@juyal janhvi

21 Jul 2025

Emerging Tech Data Science & AI Community AI Industry Trends

In H12025, a leading product company announced plans to cut nearly 8,000 jobs as it ramped up AI-based automation efforts. Another big tech laid off its experienced IT position holder along with a significant amount of human capital. Read any recent…

Digital Identity Management: Defending Privacy in a Hyperconnected World

Jayajit Dash

@Jayajit Dash

18 Jul 2025

Data Science & AI Community

In a landscape where your digital footprint mirrors the uniqueness of your DNA, Digital Identity Management Systems (DIMS) emerge as the vigilant guardians of the internet, poised in 2025 as our primary defence against cyber threats. In the…

Types of Chatbots: Script-Based and AI-Based

Sparkout Tech

@sparkouttechmarketing

15 Jul 2025

Data Science & AI Community

A chatbot processes a user's question and provides an appropriate response. Chatbots operate based on pre-programmed responses, artificial intelligence, or both. Script-based chatbots Script-based (or rule-based, command-based, keyword-based, or…

AI-Driven Personalization in Wealth Management

NuSummit

@nusummit

15 Jul 2025

Wealth management isn’t evolving slowly; it’s transforming fast and dramatically, and clients are leading the way. At the heart of this shift is artificial intelligence (AI), which helps wealth managers move beyond routine digital tools and…

Building Client Loyalty with Data and Analytics: Top Five Strategies in Asset Management

NuSummit

@nusummit

15 Jul 2025

Digital Transformation Data Science & AI Community BFSI Big Data Analytics

The future of asset management is rapidly shifting, with many conventional tools, products, and approaches less effective than they were once conceived to be. The good news is that asset managers today have technologies such as automation,…

How AI Can Improves Data Protection For Your Business.

AlgoDocs

@AlgoDocs

14 Jul 2025

AI Data Privacy Data Science & AI Community

In today’s digital world, data is one of the most valuable resources. Every day, businesses, governments, and individuals create, share, and store huge amounts of data. This includes customer records, financial details, health information, legal…

New

EVOLUTION OF DATA PIPELINES

L&T Technology Services

Current state of Data Pipeline Architectures and Challenges

Future state of Data Pipeline Architecture and Key considerations

Conclusion

L&T Technology Services

ER&D

Why Big Techs are Replacing Roles with AI and How You Can Stay Relevant?

Janhvi Juyal

Digital Identity Management: Defending Privacy in a Hyperconnected World

Jayajit Dash

Types of Chatbots: Script-Based and AI-Based

Sparkout Tech

AI-Driven Personalization in Wealth Management

NuSummit

Building Client Loyalty with Data and Analytics: Top Five Strategies in Asset Management

NuSummit

How AI Can Improves Data Protection For Your Business.

AlgoDocs

About Us

Knowledge Center

In the News

Topics In Demand

Notification

New

EVOLUTION OF DATA PIPELINES

Current state of Data Pipeline Architectures and Challenges

Future state of Data Pipeline Architecture and Key considerations

Conclusion

ER&D

Share this blog

Related blogs

Janhvi Juyal

21 Jul 2025

Jayajit Dash

18 Jul 2025

Sparkout Tech

15 Jul 2025

NuSummit

15 Jul 2025

NuSummit

15 Jul 2025

AlgoDocs

14 Jul 2025

Opcito Technologies

11 Jul 2025

Getlatest

10 Jul 2025

Getlatest

10 Jul 2025

Getlatest

10 Jul 2025

Valiance Solutions

10 Jul 2025

Daniel Walker

09 Jul 2025

About Us

Knowledge Center

In the News

Newsletter