Simulating Real World Audio for Voice Command Engine

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

Simulating Real World Audio for Voice Command Engine

Ignitarium

@Ignitarium

March 15, 2021

OVERVIEW

Despite the ubiquitous presence of voice assistants in our homes and workplaces, the technological intricacies of how automatic speech recognition works continue to amaze us.

One of the most crucial aspects that determines the accuracy of a good voice assistant is based on testing it in generalized real-world environments, which unfortunately is very difficult to do. Hence, engineers implement the test infrastructure to simulate these environments. In this blog, we will see how the Audio AI team at Ignitarium tests its deep learning models on real-world simulations.

Dataset Preparation and Collection

A real-world audio signal is very challenging to create in a simulated environment. In order to do so, two sets of audio are needed namely: noisy signals (background noise) and specific audio keywords of interest (KOI).

Noisy signals are collected in different formats from thousands of sources like work locations, industrial shop floors, bus stations, birds chirping, computer-generated noises, etc.

Data Labeling

Data Labeling is an important part of any machine learning model. A good model requires quality data, which in turn requires meticulous labelling especially in the case of audio. To ensure the quality of real-world simulated audio, which for simplicity we will be calling as ‘long audio’, our KOIs need to be properly labelled. The KOIs collected usually have a lot of unwanted noise, so the audio files need to be listened to and the starting and endpoints of the speech region marked. These points are then saved as json files, which in turn are referenced, during long audio generation.

Fig 1 Audio Labeling Tool

Long Audio Creation

For creating a long audio file, a random noise sample is picked and a 10 second duration clip is cropped. Or if the noise sample has duration less than 10 seconds, it is repeated to meet the criteria. The 10s noise sample is randomly set to a minimum loudness value and maximum loudness value (in dB) in a config file.

Fig. 2: Random Noise Sample

Next, a random KOI is selected based on a set of preconditions. The preconditions are kept in a configuration file for ease of accessibility. The preconditions include minimum and maximum sample duration and loudness, maximum allowable signal to noise ratio (SNR) etc. Based on these conditions, the KOI is embedded randomly anywhere within the noise signal respecting the max allowable SNR and max allowable noise dB.

Fig. 3: KOI embedded noise signal

The next KOI will be embedded by maintaining a random distance between minimum allowable duration and maximum allowable duration. Special attention is also given to ensure that the noise clips don’t get attenuated because of speech embedding. These steps are repeated till it becomes impossible to further insert a KOI into the long audio.

Fig. 4: Multiple KOI embedded noise

The above steps are repeated multiple times to generate long audio of any desired length. Along with the generated long audio wav file, a metadata file and a csv is also created. The metadata will capture the properties of the contents of the long audio file. The csv file contains the start end location of KOI(s) in the long audio.

Fig. 5: Final generated long audio

Conclusion

Validating the accuracy of voice command engines against real-world conditions is a complicated task, requiring significant time and resources. In this article, we show how representative audio is created in a fully configurable and automated manner, allowing our Audio ML engineering teams to have confidence in our model accuracy very early in their development cycles.

This write-up first appeared as a blog on Ignitarium.com

#ignitariumblog #speechrecognition #kws #asr #ascr #audioml #deeplearning

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

Ignitarium

Unlocking the Potential of Digital ...

L&T Technology S..

Engineering Res..

04 Jul 2025

Future of Insurance Claims: How AI ...

Ken Milko

AI

03 Jul 2025

[Part 2] The Geopolitical Chessboar...

Dhiraj Sharma

Digital Transfo..

03 Jul 2025

Why Inferencing as a Service Is the...

Cyfuture.AI

AI

03 Jul 2025

The Rise of Crypto Launchpads in th...

Abijohn

Blockchain

02 Jul 2025

AI Workbenches Powering Underwritin...

TestingXperts

AI

02 Jul 2025

How ICOs Are Changing the Startup L...

Abijohn

Blockchain

02 Jul 2025

How Token Development Services Are ...

Marco luther

Web 3.0

02 Jul 2025

What is a White Label Cryptocurrenc...

BlockchainX

Blockchain

01 Jul 2025

MLOps vs DevOps: Key differences dr...

Opcito Technologies

DevOps

01 Jul 2025

Navigating Market Volatility: Strat...

Sanjaya Kumar Parida..

148

Fintech

30 Jun 2025

Revolutionizing Diagnostics: How Ag...

Larisa Albanians

HealthTech and ..

30 Jun 2025

Acknowledging Major Strides In Tech : How ChatGPT Became the Default AI Identity in a Digital Room Full of Buzz?

SumCircle

@SumCircle

27 Jun 2025

Background Once upon a neural network...In 2018, OpenAI launched GPT—Generative Pretrained Transformer—laying the foundation for a new era in AI, inspired by Google’s 2017 transformer model. But things really got interesting with GPT-…

AI OCR: How Context-Aware Intelligent Image Data Extraction Provides Understanding and Insight from Scanned Image Documents in Real-Time

AlgoDocs

@AlgoDocs

27 Jun 2025

AI AI Inside

Businesses have been using optical character recognition, or OCR, as the preferred technology for transforming handwritten or printed text into machine-readable data for many years. The OCR technology completely changed how we process data from…

The Critical Role of Data Annotation in Training AI and Machine Learning Models

Gurpreet Sing..

@gurpreetarora

26 Jun 2025

Data Science & AI Community AI

Artificial intelligence (AI) and machine learning (ML) are no longer futuristic technologies, but an unbelievable reality. Take for example, unlocking phones without having to enter pins or passwords manually. Another instance when you say, “Setan…

How AI Is Quietly Transforming Insurance Fraud Detection

Ken Milko

@kenmilko

26 Jun 2025

AI Data Science & AI Community

Insurers would agree that insurance fraud is a persistent and costly challenge. Every year, billions of dollars go down the drain. Legacy fraud detection methods struggle to keep pace with the increasingly sophisticated tactics used by fraudsters.…

?️ AI Adoption Is Racing Ahead — But Where’s the Risk Visibility?

Rashmin Sanwa..

@rsanwatsarkar

26 Jun 2025

Cyber Security & Privacy AI

AI is the new arms race. From generative copilots and internal chatbots to full-blown LLM integrations, organizations are moving fast to operationalize AI. For many, the pressure to adopt is intense — competitive threats, innovation cycles, and…

How AI Agents Are Helping Restaurants Cut Costs and Streamline Appointments

Aeologic Tech..

@aeologic

25 Jun 2025

AI Inside AI

The Restaurant industry is going through immense pressure of managing various operations in a fast demanding world. The restaurants are not able to fulfill demands on time, manage a larger number of customers at once, handle appointments, and incur…

New

Simulating Real World Audio for Voice Command Engine

Ignitarium

Ignitarium

Acknowledging Major Strides In Tech : How ChatGPT Became the Default AI Identity in a Digital Room Full of Buzz?

SumCircle

AI OCR: How Context-Aware Intelligent Image Data Extraction Provides Understanding and Insight from Scanned Image Documents in Real-Time

AlgoDocs

The Critical Role of Data Annotation in Training AI and Machine Learning Models

Gurpreet Sing..

How AI Is Quietly Transforming Insurance Fraud Detection

Ken Milko

?️ AI Adoption Is Racing Ahead — But Where’s the Risk Visibility?

Rashmin Sanwa..

How AI Agents Are Helping Restaurants Cut Costs and Streamline Appointments

Aeologic Tech..

About Us

Knowledge Center

In the News

Topics In Demand

Notification

New

Simulating Real World Audio for Voice Command Engine

Share this blog

Related blogs

L&T Technology S..

04 Jul 2025

Ken Milko

03 Jul 2025

Dhiraj Sharma

03 Jul 2025

Cyfuture.AI

03 Jul 2025

Abijohn

02 Jul 2025

TestingXperts

02 Jul 2025

Abijohn

02 Jul 2025

Marco luther

02 Jul 2025

BlockchainX

01 Jul 2025

Opcito Technologies

01 Jul 2025

Sanjaya Kumar Parida..

30 Jun 2025

Larisa Albanians

30 Jun 2025

About Us

Knowledge Center

In the News

Newsletter