Quality Assurance in GenAI Projects - Evolution from API Testing to Model Response Verification | nasscom | The Official Community of Indian IT Industry

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

Quality Assurance in GenAI Projects - Evolution from API Testing to Model Response Verification

Sanju Dalla

@Sanju Dalla

May 31, 2024

Digital Transformation

In the rapidly evolving technology era, the emergence of Generative Artificial Intelligence (GenAI) has brought a new area in software development and quality assurance.

Traditionally, for digital transformation projects around microservices architecture, API testing is of paramount importance to provide fast feedback on the functionality and performance of software applications. However, with the emergence of AI into these systems, the focus has shifted.

While API testing remains important, the critical phase in testing GenAI-based projects requires verifying the responses generated by the applications or AI models.

In this whitepaper, we will look at the digital assurance design of GenAI-based projects, highlighting the shift from API testing to model response verification.

Shift Left

Requirement Understanding

Model behavior for use case
- Quality Assurance teams should be well versed with applications being built. All the functionalities that can impact the outcome due to model customizations should be known
Understanding of Knowledge base
- to be used for text based GenAI applications

Architecture and Design

Architecture components and communications between these components should be understood
Functioning of RAG process, Embedings, Vector search or any other important parameters like temperature
Prompt templates - prompt templates for creating a relevant and rich context to fetch right response from Model
Tokenization, Rate limit and token limit of model

Code Quality - Static code analysis and Unit testing

Implementation of code analysis and unit testing for early feedback
Ensure correctness, security, maintenance and performance is focused

Automation Testing and Manual Validations

Manual Model evaluation and validation

Human Evaluation (HIL)
- Implement human in the loop framework where domain experts will review the output of models for various data sets.
Domain/Business specific Assessment Framework
- Domain expert should create an assessment framework to highlight attributes that contributes to the success or failure or accuracy % of model response
Testing type, Test Data and Scenarios
- Test Data - Domain SME should prepare test data for each round of test and check the accuracy using framework
- Prompt and Response testing
  - General Scenarios - Domain SME should create multiple different set of questions or prompts , get it reviewed by POs
  - For text based applications, utilize GenAI itself to document a series of questions and model answers from a chunk of text on a particular subject and it is important that these are carefully checked manually before their use.
  - Prompt Variation
    - empty input or excessively long sentences
    - Same input but a lot of variation in outputs.
    - Multiple choice questions
    - edge cases or challenging examples that may push the model's limits
    - Relevance over time (add new knowledge, new checks and check the relevancy)
  - If application is related to Code generation
    - Incomplete code,Simple code,Complex code (nested structure),Code with comments and documentation,Code with external libraries,Code with deliberate errors,Code with exception handling ,Code with various format,Code with multithreading
  - Adversarial testing - to assess the robustness of AI models against unexpected or malicious inputs.
    - Ensure model is trained for adversarial inputs so it can generate right response
      - Explicit or Implicit input
        
        Malicious or Toxic or Ambiguous
        
        Inconsistent or Inaccurate or Non existing
        
        Biasness - age/race/gender
        
        Negation
      - Intensity Variation
        
        Adjusting the tone, sentiment, or emphasis
        
        Variation of output or information loss with intensity
        
        Ask for concise information and then ask if original information and concise information are having same meaning
Error handling
Compatibility of functionality with various versions of the model
Metrics
- Use Accuracy metrics to assess the accuracy of responses given by a machine learning model
  - Accuracy: The ratio of correctly predicted instances to the total number of instances. Formula: Accuracy= Correct responses/Total responses

Automation Testing

GenAI Application Response verification through automation

Design Automation and Create Python framework supporting following

Capture the inputs and responses marked as accurate or validated by Domain expert
Keep all the validated responses into a file, Read the responses from file
Create a function in the automation framework to calculate similarity between actual output and the expected output
Execute and create Report
Regression should be executed on a regular basis to ensure that the model remains consistent over time
Regression should be categorized in a manner to minimize cost of model usage. Service virtualization can be used for such cases where only API test is important and model accuracy is not being measured.
Monitor drift through regression and ensure there is no deviation
There is no “one size fits all” approach to choosing an evaluation metric so depending upon the use case following can be used

Metrics - Exact Match & Similarity Score

For exact match - Use evaluate library from hugging face to calculate similarity and get following scores
- BLEU Score: Measures the similarity between the generated output and the reference answers.
- ROUGE Score: Evaluates the overlap of n-grams between the generated and reference text.
- METEOR Score: Takes into account precision, recall, and alignment of generated and reference text.
Sentence/Text Similarity
- Compute the dot product between the embeddings of the generated and reference answers, use sentence transformer library from hugging face
- cosine similarity from hugging face
Factual Consistency - Assess whether the generated answer is factually consistent with the reference answer.
- Precision, Recall, F1 Score: Compare the generated facts to the ground truth facts.

API and UI tests Automation

Like we do for digital transformation projects or Microservices based projects, API testing plays an important role in GenAI based applications too which are communicating through microservices in the backend.

APIs should be tested and should be automated to ensure correctness of functional behavior of microservices, to reduce effort of regression and to catch bugs early in the development.
UI should be tested for user experience and functionality and should be automated
Integration tests should be written and automated to ensure the correctness and completeness of communication between microservices to support the end to end functionality of the system

Non functional requirements

Performance Testing

Models have a limit on the number of tokens they can process in a single step and once this limit is reached, models may start to “forget” previous information.

Parameters or error code to be monitored

Rate limit reached for requests - indicates that too many requests are sent in a short period of time and have exceeded the number of requests allowed.The rate limit expects that requests will be evenly distributed over a one-minute period. And you will receive a 429 response if it is not maintained even though the limit isn't met.
Total number of tokens, sum of prompt tokens and completion tokens a model is allowed to respond with
Max_tokens parameter: It determines the max tokens to be output as the model's response to avoid consuming more tokens.
Time to first token render from submission of the user prompt, measured at multiple percentiles.
Requests Per Second (RPS) for the LLM
Tokens rendered per second

Security

Proactive Feature and Architecture review
SCA and SAST using Sonar
DAST using ZAP for top 10 OWASP principles
PenTest to discover vulnerabilities
Prompt injection to test injection of malicious inputs - Unicode based prompts can be introduced to avoid threat injection.

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

Sanju Dalla

How to Build a Secure and Scalable ...

Marco luther

Blockchain

30 Jun 2025

Acknowledging Major Strides In Tech...

SumCircle

AI

27 Jun 2025

How Compliance as a Service (CaaS) ...

Anaptyss

BFSI

26 Jun 2025

The Future of Life Science Is Intel...

Niraj Jagwani

AI

25 Jun 2025

Bridging the Gap Between Developers...

Sumeet Jha

DevOps

24 Jun 2025

Can Software Read Your Mind? The Su...

Larisa Albanians

Application

24 Jun 2025

The Green Cloud: Powering the Inter...

Cisco India

451

ESG & Susta..

23 Jun 2025

The Quick Commerce Boom: Can India&...

Kuhu Singh

Digital Transfo..

20 Jun 2025

What is a Legal AI Assistant?

elint AI

AI

19 Jun 2025

Why India Is the World's Next ...

Saurabh Dwivedi

131

Global Trade

18 Jun 2025

Modernizing Construction Ops: Softw...

Niraj Jagwani

Digital Transfo..

18 Jun 2025

AI-Led Risk Management Transformati...

NuSummit

AI

17 Jun 2025

AI-Powered Search for Organizations: Transforming Enterprise Knowledge Discovery

SumCircle

@SumCircle

05 Jun 2025

Digital Transformation AI Inside

As the old adage goes, "Knowledge is power-but only when you know where to find it." AI at scale can help organizations to perform actions that cannot be done before to drive sales like never before, analyze content, predict user behaviour, accurate…

Bridging the Mental Health Divide: A Wake-Up Call for India’s Tech Industry

Mental Health..

@MHFA India

02 Jun 2025

Diversity And Inclusion HealthTech and Life Sciences

Mental health in India is reaching a critical moment. Even though awareness is slowly improving, the support systems we need still haven’t caught up. Public healthcare is overstretched, private options are expensive, and insurance coverage is often…

India’s MSME Sector Emerges as Global Export Engine in Manufacturing and Packaging

Saurabh Dwive..

@saurabhdwivedi2309

31 May 2025

Global Trade Digital Transformation Industry 4.0 Manufacturing

India’s MSME (Micro, Small, and Medium Enterprises) sector is undergoing a paradigm shift, from being a domestic support system to becoming a significant contributor to global supply chains. In particular, its transformation into a high-impact…

Why Global Buyers Are Choosing Indian MSMEs Over Chinese Suppliers

INDUCTUS

@INDUCTUS GCC

31 May 2025

Global Trade Manufacturing GCC

In a significant shift reshaping the global supply chain landscape, international buyers are increasingly opting for Indian Micro, Small, and Medium Enterprises (MSMEs) over their Chinese counterparts. From textiles to engineering goods,…

AI in education: How AI is Enhancing Teaching and Learning in Education

SumCircle

@SumCircle

30 May 2025

Digital Transformation Emerging Tech AI

AI in education has always been part of learning since the 1960s, when AI in education began in Programmed Logic for Automatic Teaching Operations (PLATO) , developed by the University of Illinois in 1960. Later on, it became the first computer…

Bridging the Gap Between Developers and Business Leaders: A New Era of Collaboration

Sumeet Jha

@credex

30 May 2025

IT Services

There’s a familiar story I’ve seen unfold more than once. A product manager walks into a room, brimming with enthusiasm for a new feature that aligns perfectly with market needs. On the other side, developers listen patiently. But then, spend the…

New

Quality Assurance in GenAI Projects - Evolution from API Testing to Model Response Verification

Sanju Dalla