The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.
All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.
Disclaimer
The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.
For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.
Data redaction is the process of hiding and protecting sensitive information by using advanced analytics techniques such as Natural Language Processing (NLP) and Named Entity Recognition (NER). Sometimes it is also misinterpreted as data anonymization. But in data anonymization, the information is masked, whereas in data redaction the information is completely removed.
At a time when virtualization and the rise of cloud computing have made the storage, access, preservation, and backup of data centralized, ensuring the protection of privacy becomes critical.
Sensitive data must be removed from public view to prevent identity theft and fraud attempts from malicious parties. However, businesses holding extensive database facilities with vast amounts of physical data can have a painfully slow and cost-prohibitive manual editing process.
In such cases, Data redaction is a suitable technique to overcome the problem. This article looks at Data Redaction and how it will help you safeguard sensitive customer data.
What Is Data Redaction?
Data redaction is a type of text analysis technique that helps you safeguard sensitive data and control it from getting compromised. You can remove select information from documents to prevent data exposure. This is usually done manually by people in an office. However, if the documents are higher in number, says, 1 million, it becomes extremely excruciating for a person to handle all of it together.
In such cases, advanced analytics techniques such as Named Entity Recognition can automate the complete redaction of data from documents.
The redacted information is a common term for blackening out information. However, it is easier said than done, especially when uploading documents online. One famous example is the debacle by the New South Wales Medical Council in 2016.
The staff at the institution blacked out the person’s name before uploading the document. However, the person’s identity remained in the underlying data linked with the search engine results. Removing information that had already gone out was not easy. The medical council team had to contact Google to fix the issue.
Data Redaction for Clinical Trial Documents
A leading pharmaceutical and life sciences client wanted to find a solution that can reduce their manual hours of redacting patient information from medical records. Earlier it was taking weeks and months for them to manually redact patient data, which resulted in even more expenses.
With a problem, there always comes a solution, the organization had a need to protect contents such as intellectual property and personally identifiable information in clinical trial documents that are shared with third parties including health authorities and partners.
Anonymization of clinical summary reports is a regulatory requirement for EMA and Health Canada. Regulatory requirements have been growing over recent years and other countries’ health authorities are expected to follow suit leading to an increased demand for reduction and anonymization solutions.
The standard approach was to outsource to vendors the anonymization and redaction of patient personally identifiable information. The third-party vendors were taking longer time, were expensive, and yet not delivering an assessment of the risk of re-identification of data and good accuracy of the documents.
Coming to the solution, a custom platform for redaction and anonymization for them, leveraging NLP and other AI (Artificial Intelligence)/ ML (Machine Learning) technologies.
The relevant Pharma Co. personnel can now cater to requests for clinical trial information from outside quickly and more accurately and with the option for a human to quickly validate the results from the AI/ ML enabled platform.
This has resulted in 97%-time savings in the submission process and is expected to deliver savings of $1million per annum.
Data Redaction Examples
Data redaction examples can be plentiful, depending on the masked information. Let us look at them in detail.
Complete Redaction: It involves redacting the entire content in a document. Data with characters can have a single space. If the data has numerical values, it usually gets redacted to zero.
Half Redaction: You can redact a small portion of the data in the document. For example, you can edit the last six digits of customers’ mobile numbers. It would be like 7023XXXXXX.
Random Redaction: It displays random values to users each time they view a document. The values would depend on the type of underlying information in the record.
Regular Expressions: It identifies patterns to redact data. Redacting email addresses that can have varying character lengths is a typical example.
When is Data Redaction Needed?
You may wonder, when and why is data redaction needed? Here are the different scenarios in which you will have to perform data redaction.
Upon Receiving Data
Redacting the data as soon as you receive it helps prevent potential leaks. You can redact all the relevant information from the datasets and reports that you receive. Your redaction process can be automatic or manual, depending on data sensitivity. It is best to check if you have redacted everything correctly before sharing the documents with other stakeholders.
Before Distribution of Data
Individual data in reports and datasets can often remain applicable to only a few stakeholders. In such cases, you can redact data before sharing it with them. For example, the financial information in a document may not be relevant to your marketing team. You can cleanse the data before sharing the record with the marketing team.
Upon Completing Task
After finishing the task, redacting data helps ensure that you have all the necessary information to execute the job successfully. It will also help you avoid redacting essential data that might be critical for the activity. It enables you to reach completion hassle-free while ensuring data security.
Before Data Archiving
Data archives ensure that you have the necessary records to operate your business smoothly and meet compliance norms. Redacting data before archiving it allows us to safeguard information from potential breaches. Automation in archiving enables complete redaction within a short period without leaving any essential information.
Before Data Disposal
You may wonder if it would make sense to redact data from the documents you plan to delete. The scenario is like the ATM withdrawal receipts that you tear before discarding them. The possibility of someone recovering those documents will always be high. It is thus best to redact sensitive information even if you might be deleting those documents for good.
What are the Key Data Redaction Techniques?
Here are the three essential methods of data redaction.
Page Location Redaction
You may deal with standard customer information reports that include everything from their birth dates to credit/debit card details. If the report has a consistent format, it will become easier to redact the sensitive data. However, you will have to safeguard against failed redactions also. In such cases, you will need to make the changes manually, wherever applicable.
Pattern Redaction
If you have a large and complex business, you will likely receive reports in various formats. You may also have to scan your databases to segregate information into types. Matching patterns to identify and redact the data is one of the better ways to manage sensitive information in such an environment. For example, most phone numbers usually have the XXXX-XXX-XXX pattern. Redacting this pattern-based information will be much easier through the pattern redaction method.
Manual Redaction
Automated redaction is preferable, but it may not be possible, especially in situations where there are no recognizable patterns. However, automation will be your best bet wherever possible. Ensure that you follow all the steps involved in the redacting process to avoid costly errors.
Data Redaction Use Cases
Here are the different use cases of Data Redaction across industries:
Financial Services
Financial companies have to deal with confidential and sensitive customer information overload. They often extract relevant information from the enormous amount of data they work with. AI-enabled tools can help them to filter information through keywords and phrases. Financial firms can use AI-powered solutions to mine relevant information in texts, images, and videos. Some examples of data redaction for financial services are credit/debit card numbers, bank account numbers, mobile numbers, etc.
Pharmaceutical and Lifesciences
Healthcare institutions can end up spending significant time on patient-related paperwork. Redacting sensitive information in minutes will free up the staff to help them focus better on patient care. Whether audio, video, or text files, data redaction can work on all document types. Healthcare institutions can also improve their workflows and enhance productivity while protecting sensitive patient information.
More than healthcare, data redaction is important in clinical trial documents as well. NLP in pharma and life sciences has transformed the manual efforts of clinical experts. Natural Language Processing can help in analyzing medical records in minutes.
Law Enforcement
Law enforcement agencies often race against time to ensure speedy justice for victims. Streamlined workflows help these agencies close cases faster and clear existing backlogs quickly. They can use data redaction to maintain their databases while enabling criminal/victim identification compliance and saving crucial time.
Transportation
Transportation is one of the few industries that is extensively document-heavy. Documents can be from invoices to toll tax receipts and everything in between. Data redaction helps move things swiftly as they should in the transportation industry.
Media and Entertainment
The media and entertainment industry deals with hours of audio and video footage. Whether video editing or dubbing, it can be a tedious task when a large portion of the raw footage needs edits. Data redaction makes it easy for media and entertainment professionals to hide sensitive information in minutes.
Government
Government organizations hold sensitive data of all kinds. They need to adopt all possible safety standards to ensure no data compromises. Data redaction is one of the vital elements in the process that helps them protect sensitive information and pass audits. AI-enabled tools help redact texts and objects with complete ease.
IT & Operations
IT systems are sensitive networks of information that need advanced protection. A minor breach can bring the entire organization’s operations to a halt. Data redaction gives IT professionals the right tools to redact sensitive data and improve their workflows. Automation helps them increase their productivity, allowing them to focus on other essential duties.
Data Redaction vs Data Masking
Data masking is a common term that you may interchangeably use with Data redaction. However, data masking and data redaction have a few differences. Data masking involves replacing accurate information from documents with inaccurate data with the same structure. On the other hand, data redaction only removes sensitive and identifiable information.
Data masking finds extensive use within an organization for testing and training purposes. For example, the IT team would not want identifiable information to get exposed during the testing stage. The types and structure of data remain as it is, which is ideal for future use. On the other hand, data redaction enables concealing personal information that can be easily comprehended. Data redacted for privacy concerns ultimately protects it from falling into the wrong hands.
Benefits of Data Redaction
Data redaction can offer several benefits. Here are some of the essential ones:
Ensures Data Security
You can keep sensitive and identifiable data of your customers secure with data redaction. Safeguarding information has become more critical as data breaches have become common worldwide. Even a minor data breach may impact an organization’s credibility. Investors would be wary of putting in money, while customers would look for secure alternatives.
Improves Data Usability
Data remains at the heart of the operations of any business. Depending on your business type, you may also want to publicly share information with your customers. In such cases, Data redaction helps you protect sensitive data even if you make it public. Your customers will be able to access relevant information while you can still protect sensitive data.
Enables Improved Compliance
Increased data breaches in recent years have forced regulatory agencies to introduce stringent norms to safeguard personal information. Data redaction gives advanced security options ideal for preventing criminal activities such as hacking attempts.
Bottomline
Data redaction has been around for some time now, but it’s still a fairly new technology in terms of implementation. With its unique properties, it has the potential to help protect sensitive data from falling into the hands of unscrupulous individuals. For businesses looking to implement data redaction technology, the first step is determining what kind of application is most suitable for your business.
Note: This blog on data redaction was originally published on blog.gramener.com
That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.
Gramener is a design-led data science company that solves complex business problems with compelling data stories using insights and a low-code analytics platform.
In the rapidly evolving landscape of artificial intelligence, a paradigm shift is underway. While Large Language Models (LLMs) like GPT-4 have dominated headlines, Small Language Models (SLMs) are quietly revolutionizing how businesses implement AI…
Today, even the smallest business depends on technology and software to manage the business process, capture data and move the enterprise forward. No matter the size of a business, executives and managers now realize the value of providing analytics…
As technology progresses, astronomical data creation is inevitable. According to a report by Statista Research Department, global data creation is projected to grow to more than 180 zettabytes by 2025.
With so much data being generated, it is…
According to Experian, 95% of business leaders report a negative impact on their business due to poor data quality. It shows the importance of data validation as a critical step to ensure a smooth data workflow. Any inconsistencies in data at the…
From optimizing promotions to predicting consumer trends, CPG companies have to be nimble and data-driven to compete. However, efficiently leveraging real-time and historical data is a difficult task for many CPG companies. The core challenge today…
I've witnessed first hand the transformative potential of Large Language Models (LLMs) in the business world. Yet, despite significant investments in this technology, many organizations still grapple with a fundamental question: "How can we leverage…