B̶̶e̶̶y̶̶o̶̶n̶̶d̶̶ Before Production: ‘Left Shifting’ Reliability Engineering Across the SDLC for Robust and Resilient Systems

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

B̶̶e̶̶y̶̶o̶̶n̶̶d̶̶ Before Production: ‘Left Shifting’ Reliability Engineering Across the SDLC for Robust and Resilient Systems

QualityKiosk

@QualityKiosk

December 19, 2023

Emerging Tech

The Mars Climate Orbiter: In 1999, the Mars Climate Orbiter failed to enter its intended orbit around Mars and was destroyed upon entering the Martian atmosphere. The failure was caused by a software error that used the wrong units of measurement.

The Therac-25 radiation therapy machine: In the 1980s, the Therac-25 radiation therapy machine was responsible for the deaths of six people and severe injuries to hundreds more. The machine was flawed because it could deliver fatal doses of radiation due to software errors.

Yes, it is scary but in 2023, we have come a long way in building better software products.

In the ever-evolving landscape of software innovation and development, ensuring the reliability of software ecosystems is paramount. Designing for reliability involves integrating reliability engineering and quality engineering principles throughout the Software Development/ Quality Engineering Life Cycle (SDLC/QELC). By “left-shifting” these practices—incorporating them early in the development process—we can proactively address potential design, functional, and non-functional issues, reduce the likelihood of failures, and ultimately deliver more robust and dependable software.

The growing impact of Technical Debt (TD) has become the biggest obstacle to making any changes to existing code bases. TD principle increased to ~$1.52 trillion (because deficiencies are not getting fixed). CISQ | The Cost of Poor Software Quality in the US: A 2022 Report. Let’s explore how we can reduce it.

1. Requirements Phase: Setting the Foundation for Reliability

In the Requirements Phase, the foundation for a reliable software system needs to be laid down. This phase involves more than just listing product/app features and functionalities; it’s about clearly defining the expectations for how the software should perform and handle potential challenges. Let’s break down a few aspects with examples:

Clearly define reliability requirements: Specify the expected level of performance, availability, and fault tolerance. Create a Requirement Traceability Matrix that becomes the single source of truth for all stakeholders.
Example: Imagine developing an online banking application. In this phase, specify that the system must be available 99.99% of the time to ensure customers can access their accounts reliably. Additionally, set a performance requirement, stating that transactions should be processed within one second to provide a seamless user experience. It is imperative to establish and comply with tolerances at various levels, including components, applications, services, and specific journey thresholds, all contributing to accurately formulating Error Budgets.

Collaborate with stakeholders: Understand their expectations regarding the software’s reliability and incorporate these expectations into the project scope.
Example: Work closely with stakeholders, including bank executives, customer service representatives, and end-users. If customer service expects real-time transaction updates, this becomes a reliability expectation. By collaborating, ensure that the software aligns with the diverse needs and expectations of different stakeholders, creating a comprehensive reliability profile.

Identify potential risks and establish mitigation plans: Identify potential risks related to reliability and establish mitigation plans early in the project.
Example: Consider the risk of third-party payment gateway failures in the banking application. During this phase, identify this as a potential reliability risk. To mitigate this, plan for an alternative payment gateway or implement a failover mechanism to switch to another provider in case of an outage seamlessly. This proactive approach minimizes the impact of potential failures.

By weaving these constructs into the Requirements Phase, the development team not only establishes a clear roadmap for reliability but also ensures that stakeholder expectations are aligned, and potential risks are mitigated before the first line of code is written. This proactive approach sets the tone for a development process prioritizing reliability from the beginning.

2. Design Phase: Building a Reliable Architecture

As the design phase unfolds, the focus shifts to constructing a resilient and dependable architecture that forms the backbone of reliable software. This is critical for enterprises – de novo or born-digital organizations need to know how fast and to what extent they can scale, and traditional enterprises need to understand how their legacy ecosystem will seamlessly interact with their new-age-front-end apps and API/microservices-based integrations.

Implement Design for Reliability (DfR) Principles: Suppose you are developing a web-based project management application. Implement DfR principles by structuring the architecture to concurrent user interactions efficiently. This could involve using microservices to isolate functionalities, reducing the impact of a failure in one component on the entire system.

Conduct Failure Mode and Effects Analysis (FMEA): In developing healthcare information system, conducting FMEA would involve identifying potential failure points. For instance, if the system is responsible for storing and retrieving patient data, analyze failure modes such as data corruption or loss. Design features, such as regular automated backups and data validation checks, to prevent or mitigate these failures.

Integrate Redundancy and Error-Checking Mechanisms: Imagine developing a financial trading platform. Integrate redundancy by deploying the system across geographically distributed servers. Implement error-checking mechanisms that verify the accuracy of financial transactions, ensuring that trades are executed reliably, and errors are detected and corrected in real-time.

Consider Scalability and Performance Requirements: Example: For an e-commerce platform, scalability is a critical factor. Design the architecture to handle varying loads during peak shopping seasons. This might involve implementing load-balancing techniques, optimizing database queries, and utilizing scalable cloud infrastructure to ensure the system remains reliable even when experiencing high user traffic.

The Design Phase needs to be thought through with these scenarios. This ensures that the software architecture is robust and built to withstand potential challenges. By proactively addressing failure points, implementing redundancy, and considering scalability requirements, the development team lays the groundwork for a system that functions reliably and adapts to the evolving demands of users and the environment.

3. Coding Phase: Writing Reliable Code:

At the heart of the development process, the Coding Phase is where the blueprint comes to life. Ensuring reliability at this stage involves writing functional code and crafting it with longevity and dependability in mind.

Adhere to Coding Standards and Best Practices: Consider a team developing a mobile banking application. Adhering to coding standards could involve using clear and consistent variable naming conventions and organizing code modularly. This promotes reliability by making the codebase more understandable and maintainable, reducing the likelihood of introducing errors during future updates.

Implement Unit Tests Early in the Development Process: For an e-commerce website, implementing unit tests early could involve creating test cases for critical functions like order processing and payment handling. By identifying and rectifying defects at the code level, the development team ensures that these essential functionalities work reliably, preventing potential issues downstream.

Leverage Static Code Analysis Tools: Imagine developing a cloud-based collaboration tool. Utilize static code analysis tools to scan the codebase for potential reliability issues and security vulnerabilities. These tools can identify issues such as memory leaks or insecure coding practices, allowing developers to address them before the software reaches the testing phase.

Encourage Code Reviews and Pair Programming: In the development of a customer relationship management (CRM) system, regular code reviews and pair programming sessions can be instrumental. Having multiple sets of eyes on the code makes reliability considerations, such as error handling and data validation, more likely to be identified and addressed. This collaborative approach ensures that reliable knowledge is shared across the development team.

Incorporating these examples into the Coding Phase ensures that the codebase functions as intended and lays the groundwork for a reliable and maintainable software product. By emphasizing coding standards, early testing, static code analysis, and collaborative coding practices, developers contribute to creating a resilient foundation that withstands the challenges of real-world usage.

4. Testing Phase: Rigorous Validation for Reliability

As the software enters the Testing Phase, the focus shifts to comprehensive validation, ensuring that the code meets the real-world challenges it will encounter.

Conduct Comprehensive Functional and Non-functional Testing: Consider a web-based project management tool. Functional testing ensures that features like task assignments and progress tracking work as intended. Non-functional testing includes performance testing to verify that the system performs well under expected loads and security testing to identify vulnerabilities that could compromise reliability.

Develop Automated Test Suites: In developing a healthcare information system, automated test suites can continuously assess the reliability of critical functionalities such as patient record updates and data retrieval. Automation ensures that these tests can be run consistently and repeatedly, providing quick feedback to developers on the reliability of their code throughout the development process.

Simulate Real-world Scenarios: For a logistics and shipping software solution, simulating real-world scenarios might involve testing how the system handles a sudden surge in package tracking requests or network latency. By validating the software’s behavior under various conditions, the development team can identify potential failure points and proactively address them before deployment.

Collaborate with Quality Assurance Teams: Imagine developing a financial analytics platform. Collaboration with quality assurance teams involves integrating reliability metrics into the testing strategy. This might include tracking response times for critical financial calculations or monitoring system behavior during peak usage periods. By incorporating these metrics, the testing process becomes more aligned with the reliability goals of the software.

Integrating these examples into the Testing Phase ensures that the software undergoes a robust and thorough examination. From functional and non-functional testing to automated test suites and simulations of real-world scenarios, the goal is to identify and rectify potential reliability issues before the software reaches the end users. Collaboration with quality assurance teams further strengthens the focus on reliability, making it an integral part of the overall testing strategy.

5. Deployment Phase:

As the software transitions to deployment, the focus shifts to seamlessly integrating new features and functionalities into the live environment. Ensuring reliability at this stage is critical for a smooth user experience.

Implement Continuous Integration and Continuous Deployment (CI/CD) Pipelines: Suppose you are deploying updates for an e-commerce platform. Implementing CI/CD pipelines ensures automated testing and deployment of changes, reducing the likelihood of human error and ensuring that only thoroughly tested and reliable code reaches the production environment. This accelerates the deployment process and minimizes the risk of introducing configuration errors that could impact reliability.

Monitor Software in Real-Time During Deployment: In deploying a real-time communication application, monitor the software in real-time as new features are rolled out. Use monitoring tools to track system performance, error rates, and user interactions during the deployment window. This proactive monitoring allows the development team to identify and address issues promptly, minimizing any potential impact on users.

Implement Feature Toggles: Consider a social media platform rolling out a new commenting system. Implementing feature toggles allows the development team to turn the new commenting feature on or off without deploying new code. In case of unexpected reliability issues or negative user feedback, the feature can be easily toggled off, providing a quick and reversible solution without a complete rollback of the entire deployment.

By incorporating these techniques into the Deployment Phase, the development team ensures the reliable rollout of new features and establishes mechanisms for quick response and recovery in case of unexpected challenges. Automation through CI/CD pipelines reduces deployment time and potential errors, real-time monitoring allows for immediate issue identification, and feature toggles provide a safety net for rapid adjustments, all contributing to a reliable and user-friendly deployment process.

6. Maintenance Phase: Proactive Reliability Management

Once the software is live, the focus shifts to maintaining its reliability over time. Proactive measures and continuous improvements ensure a consistently dependable user experience.

Establish a Robust Monitoring and Logging System: In a financial transaction system, establish a monitoring and logging system to track real-time transaction processing, system response times, and error rates. This allows the development team to identify potential reliability issues as they occur, enabling quick responses to maintain a seamless financial transaction experience for users.

Conduct Regular Maintenance Activities: Consider a healthcare management system. Regular maintenance involves applying security patches to protect patient data, updating dependencies to ensure compatibility with the latest technologies, and optimizing database queries to maintain system performance. Collectively, these activities contribute to the long-term reliability and security of healthcare software.

Encourage a Culture of Continuous Improvement: For logistics and shipping applications, encourage a culture of continuous improvement by collecting and analyzing reliability metrics. Metrics may include tracking the accuracy of package delivery times and system uptime. Use these metrics to inform future development iterations, allowing the team to prioritize enhancements that address reliability concerns and elevate the overall user experience.

By integrating these practices into the Maintenance Phase, the development team ensures that the software remains reliable and evolves to meet changing demands and challenges. Establishing a robust monitoring system allows for real-time issue identification, regular maintenance activities uphold the software’s health, and a culture of continuous improvement ensures that the software adapts to the dynamic landscape of user needs and technological advancements.

In conclusion, integrating reliability engineering principles across every phase of the SDLC/QELC, software development teams can create more dependable, resilient, and high-performance applications. Left shifting reliability engineering reduces the risk of post-deployment failures and fosters a proactive and collaborative approach to building software that meets and exceeds user expectations for reliability. As technology advances, embracing these principles becomes increasingly critical for delivering software that stands the test of time.

About the Author

Gauraav Thakar

Senior Vice President | New Market Development & Customer Acquisition, QualityKiosk Technologies

Partner & Senior Vice President covering North America region across Banking, Insurance, Capital Markets, Financial Services, Retail, Consumer, Tech & Digital industry practices.
At QualityKiosk, he leads the strategic planning, new market & customer acquisition function and drives strategic initiatives to move the organization into the next orbit of growth. Gauraav specializes in building marketing and sales teams through strategic interventions in areas of people, process, and technology as he believes technology is an enabler to bridge the gap between brands and consumers.
Before consulting, Gauraav has worked in Marketing, Sales and Consulting roles across CSC, Position2 Inc. and Exilant Technologies.
Focus topics: Digitalization, Digital Transformation, Breakthrough Growth Strategy (organic and inorganic), Go-to-Market Strategy, Marketing & Sales Strategy.
Market experience: USA, UK, Singapore, India, Sri Lanka, Malaysia, Hong Kong, Vietnam, Indonesia, Philippines, Middle East.

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

QualityKiosk

QualityKiosk Technologies is one of the world's largest independent Quality Engineering (QE) providers and digital transformation enablers, helping companies build and manage applications for optimal performance and user experience. Founded in 2000, the company specializes in providing quality engineering, QA automation, performance assurance, intelligent automation (IA) and robotic process automation (RPA), customer experience management, site reliability engineering (SRE), digital testing as a service (DTaaS), cloud, and data analytics solutions and services. With operations spread across 25+ countries and a workforce of more than 4000 employees, the organization enables some of the leading banking, e-commerce, automotive, telecom, insurance, OTT, entertainment, pharmaceuticals, and BFSI brands to achieve their business transformation goals.

The Future of AI in E-commerce: Inn...

A Digital Geek Who i..

AI

04 Nov 2024

How Weak IT Will Cause Nine Out of ...

Nidhi Dubey

Tech for Good

04 Nov 2024

Cybersecurity for Automotive ECUs –...

Ignitarium

Industry Trends

04 Nov 2024

How Data Science is Transforming Fi...

chandan gowda

372

Data Science &a..

29 Oct 2024

The Benefits of Artificial Intellig...

Aeologic Technologie..

AI Inside

25 Oct 2024

Integrating the Power of Generative...

Supriya Dixit

Leader Talk

25 Oct 2024

How AI Is Revolutionizing Learning ...

SumCircle

AI

25 Oct 2024

Resolving Data Quality Issues for S...

Intelliswift Softwar..

516

Analytics

24 Oct 2024

Investing in AI: Global Asset Manag...

Dhiraj Sharma

Digital Transfo..

23 Oct 2024

The Role of Automation in End-to-En...

calsoftinc

106

Digital Transfo..

22 Oct 2024

[Blog Series - Part 3/3] - Recalibr...

Dhiraj Sharma

Digital Transfo..

22 Oct 2024

[Blog Series - Part 2/3] - Bridging...

Dhiraj Sharma

Digital Transfo..

22 Oct 2024

How artificial intelligence and machine learning are transforming the healthcare industry

Axtria - Inge..

@Axtria

13 Jul 2023

Data Science & AI Community AI

Access to basic healthcare is seen as a fundamental global right. However, in both developed and developing economies, we find the surge in chronic lifestyle ailments and rising populations overburdens healthcare systems. The pandemic exacerbated…

Data Annotation for Video AI projects

Ignitarium

@Ignitarium

12 Jul 2023

Data Science & AI Community AI

1. Introduction Data annotation is the process of adding tags or labels to raw data such as images, videos, text, and audio. These tags form a representation of what class of objects the data belongs to and helps a machine learning model learn to…

Deriving the business value of AI in cloud

Cigniti Techn..

@cigniti

12 Jul 2023

Data Science & AI Community Digital Transformation AI

In today’s dynamic business landscape, Artificial Intelligence (AI) has emerged as a game-changer, revolutionizing business operations, enhancing accuracy, boosting efficiency, and facilitating better decision-making. Cloud computing has proven to…

How Can Technology Consulting Help in Consumer Behavior Management?

Aeologic Tech..

@aeologic

11 Jul 2023

Emerging Tech

Consumer behavior management has emerged as an essential facet of today's digital business environment. In our data-driven era, understanding and leveraging consumer behavior could be the game-changer for your business needs. This is where…

AI & ML advancements in Construction Project Management - Revolutionizing Efficiency & Decision Making

Colliers Indi..

@Colliers

10 Jul 2023

Project Management AI

AI and ML use has accelerated in recent years due to the increasing availability of data and the development of more powerful algorithms. What is AI and ML? Artificial intelligence (AI) and machine learning (ML) are two of the most…

How to Secure Your IoT and OT Devices from Attack

Sneha Sharma

@snsharma

10 Jul 2023

Cyber Security & Privacy IOT

The Internet of Things (IoT) and operational technology (OT) are two rapidly growing technology areas. However, these technologies are also vulnerable to cyberattacks. 2022 saw several of his high-profile IoT and OT attacks, including one that…

New

B̶̶e̶̶y̶̶o̶̶n̶̶d̶̶ Before Production: ‘Left Shifting’ Reliability Engineering Across the SDLC for Robust and Resilient Systems

QualityKiosk

1. Requirements Phase: Setting the Foundation for Reliability

2. Design Phase: Building a Reliable Architecture