Quality Assurance in GenAI Projects - Evolution from API Testing to Model Response Verification | nasscom | The Official Community of Indian IT Industry

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

Quality Assurance in GenAI Projects - Evolution from API Testing to Model Response Verification

Sanju Dalla

@Sanju Dalla

May 31, 2024

Digital Transformation

In the rapidly evolving technology era, the emergence of Generative Artificial Intelligence (GenAI) has brought a new area in software development and quality assurance.

Traditionally, for digital transformation projects around microservices architecture, API testing is of paramount importance to provide fast feedback on the functionality and performance of software applications. However, with the emergence of AI into these systems, the focus has shifted.

While API testing remains important, the critical phase in testing GenAI-based projects requires verifying the responses generated by the applications or AI models.

In this whitepaper, we will look at the digital assurance design of GenAI-based projects, highlighting the shift from API testing to model response verification.

Shift Left

Requirement Understanding

Model behavior for use case
- Quality Assurance teams should be well versed with applications being built. All the functionalities that can impact the outcome due to model customizations should be known
Understanding of Knowledge base
- to be used for text based GenAI applications

Architecture and Design

Architecture components and communications between these components should be understood
Functioning of RAG process, Embedings, Vector search or any other important parameters like temperature
Prompt templates - prompt templates for creating a relevant and rich context to fetch right response from Model
Tokenization, Rate limit and token limit of model

Code Quality - Static code analysis and Unit testing

Implementation of code analysis and unit testing for early feedback
Ensure correctness, security, maintenance and performance is focused

Automation Testing and Manual Validations

Manual Model evaluation and validation

Human Evaluation (HIL)
- Implement human in the loop framework where domain experts will review the output of models for various data sets.
Domain/Business specific Assessment Framework
- Domain expert should create an assessment framework to highlight attributes that contributes to the success or failure or accuracy % of model response
Testing type, Test Data and Scenarios
- Test Data - Domain SME should prepare test data for each round of test and check the accuracy using framework
- Prompt and Response testing
  - General Scenarios - Domain SME should create multiple different set of questions or prompts , get it reviewed by POs
  - For text based applications, utilize GenAI itself to document a series of questions and model answers from a chunk of text on a particular subject and it is important that these are carefully checked manually before their use.
  - Prompt Variation
    - empty input or excessively long sentences
    - Same input but a lot of variation in outputs.
    - Multiple choice questions
    - edge cases or challenging examples that may push the model's limits
    - Relevance over time (add new knowledge, new checks and check the relevancy)
  - If application is related to Code generation
    - Incomplete code,Simple code,Complex code (nested structure),Code with comments and documentation,Code with external libraries,Code with deliberate errors,Code with exception handling ,Code with various format,Code with multithreading
  - Adversarial testing - to assess the robustness of AI models against unexpected or malicious inputs.
    - Ensure model is trained for adversarial inputs so it can generate right response
      - Explicit or Implicit input
        
        Malicious or Toxic or Ambiguous
        
        Inconsistent or Inaccurate or Non existing
        
        Biasness - age/race/gender
        
        Negation
      - Intensity Variation
        
        Adjusting the tone, sentiment, or emphasis
        
        Variation of output or information loss with intensity
        
        Ask for concise information and then ask if original information and concise information are having same meaning
Error handling
Compatibility of functionality with various versions of the model
Metrics
- Use Accuracy metrics to assess the accuracy of responses given by a machine learning model
  - Accuracy: The ratio of correctly predicted instances to the total number of instances. Formula: Accuracy= Correct responses/Total responses

Automation Testing

GenAI Application Response verification through automation

Design Automation and Create Python framework supporting following

Capture the inputs and responses marked as accurate or validated by Domain expert
Keep all the validated responses into a file, Read the responses from file
Create a function in the automation framework to calculate similarity between actual output and the expected output
Execute and create Report
Regression should be executed on a regular basis to ensure that the model remains consistent over time
Regression should be categorized in a manner to minimize cost of model usage. Service virtualization can be used for such cases where only API test is important and model accuracy is not being measured.
Monitor drift through regression and ensure there is no deviation
There is no “one size fits all” approach to choosing an evaluation metric so depending upon the use case following can be used

Metrics - Exact Match & Similarity Score

For exact match - Use evaluate library from hugging face to calculate similarity and get following scores
- BLEU Score: Measures the similarity between the generated output and the reference answers.
- ROUGE Score: Evaluates the overlap of n-grams between the generated and reference text.
- METEOR Score: Takes into account precision, recall, and alignment of generated and reference text.
Sentence/Text Similarity
- Compute the dot product between the embeddings of the generated and reference answers, use sentence transformer library from hugging face
- cosine similarity from hugging face
Factual Consistency - Assess whether the generated answer is factually consistent with the reference answer.
- Precision, Recall, F1 Score: Compare the generated facts to the ground truth facts.

API and UI tests Automation

Like we do for digital transformation projects or Microservices based projects, API testing plays an important role in GenAI based applications too which are communicating through microservices in the backend.

APIs should be tested and should be automated to ensure correctness of functional behavior of microservices, to reduce effort of regression and to catch bugs early in the development.
UI should be tested for user experience and functionality and should be automated
Integration tests should be written and automated to ensure the correctness and completeness of communication between microservices to support the end to end functionality of the system

Non functional requirements

Performance Testing

Models have a limit on the number of tokens they can process in a single step and once this limit is reached, models may start to “forget” previous information.

Parameters or error code to be monitored

Rate limit reached for requests - indicates that too many requests are sent in a short period of time and have exceeded the number of requests allowed.The rate limit expects that requests will be evenly distributed over a one-minute period. And you will receive a 429 response if it is not maintained even though the limit isn't met.
Total number of tokens, sum of prompt tokens and completion tokens a model is allowed to respond with
Max_tokens parameter: It determines the max tokens to be output as the model's response to avoid consuming more tokens.
Time to first token render from submission of the user prompt, measured at multiple percentiles.
Requests Per Second (RPS) for the LLM
Tokens rendered per second

Security

Proactive Feature and Architecture review
SCA and SAST using Sonar
DAST using ZAP for top 10 OWASP principles
PenTest to discover vulnerabilities
Prompt injection to test injection of malicious inputs - Unicode based prompts can be introduced to avoid threat injection.

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

Sanju Dalla

CPG R&D: the next big opportuni...

Sigmoid

Analytics

10 Jul 2025

The Latest Buzz in Tech, Culture, a...

Getlatest

Sales & Mar..

10 Jul 2025

The Strategic Impact of Quality Man...

TechM

Application

09 Jul 2025

Trade Finance Digitization - The Ro...

Anaptyss

BFSI

08 Jul 2025

How CRM Analytics is Transforming E...

Chirag Akbari

Application

08 Jul 2025

Infrastructure as Code: Acceleratin...

NuSummit

Cloud Computing

07 Jul 2025

Customer Identity and Access Manage...

NuSummit

Cyber Security ..

07 Jul 2025

The Enterprise Sprint and Marathon ...

Janhvi Juyal

Digital Transfo..

07 Jul 2025

How Gen Z’s Use of Digital Mental H...

Mental Health First ..

Diversity And I..

07 Jul 2025

Unlocking the Potential of Digital ...

L&T Technology S..

Engineering Res..

04 Jul 2025

[Part 2] The Geopolitical Chessboar...

Dhiraj Sharma

Digital Transfo..

03 Jul 2025

Achieving Operational Excellence in...

C5i (Course5 Intelli..

Analytics

03 Jul 2025

On-Demand Webinars: How Are They Beneficial To Businesses

John

@Johnmathew

12 Feb 2025

Digital Transformation

Let us not deny the fact that on-demand webinars are steadily gaining strength in the market as one of the most sought-after strategies, helping businesses generate leads, boost ROI, and drive sales. Well, how is this possible? We know that…

Enterprise Architecture in AI era

T V Krishnan

@vkrist

12 Feb 2025

AI's potential within an organization is vast, but its success relies heavily on the foundational structures of Information Architecture (IA) and the strategic oversight of Enterprise Architecture (EA). By collaborating with AI specialists, IA and…

Scaling Smart Solutions with AI in Health: Unlocking Impact on High-Potential Use Cases

Parchaa

@Parchaa

12 Feb 2025

AI AI Inside HealthTech and Life Sciences

Introduction The World Economic Forum's report, in collaboration with ZS, highlights the transformative potential of AI in addressing systemic healthcare challenges like workforce shortages, rising costs, and inefficiencies. By…

Power BI in Finance and Banking: The Key to Smart Financial Decisions

Pragati Softw..

@Pragati Software

12 Feb 2025

Financial and banking sectors are currently experiencing a digital transformation-riding on the back of data analytics to improve their decision-making, security, and customer experience. The conventional financial organizations with their obsolete…

Mission 2030: Can tech companies drive faster attainment of UN SDGs through CSR efforts?

Kuhu Singh

@Kuhu

11 Feb 2025

Tech for Good Digital Transformation

With the countdown ticking for India to attain its targeted sustainable development goals (SDGs) by 2030, the need to devise strategies for faster attainment is at an all-time high. In particular, two key strategies stand-out for tech companies.…

Strengthening 7 Security: Top Practices for Safeguarding User Data

Harris Anders..

@harrisanderson

11 Feb 2025

Mobile & Web Development Fintech IT Services

In today’s generation, fintech app security has transformed the way we manage money ranging from smooth online payments to real investment tracking. However, this convenience brings significant risks all along. Security in fintech…

New

Quality Assurance in GenAI Projects - Evolution from API Testing to Model Response Verification

Sanju Dalla