Topics In Demand
Notification
New

No notification found.

What is Data Redaction: Examples, Techniques and Use Cases
What is Data Redaction: Examples, Techniques and Use Cases

1869

0

Data redaction is the process of hiding and protecting sensitive information by using advanced analytics techniques such as Natural Language Processing (NLP) and Named Entity Recognition (NER). Sometimes it is also misinterpreted as data anonymization. But in data anonymization, the information is masked, whereas in data redaction the information is completely removed.

At a time when virtualization and the rise of cloud computing have made the storage, access, preservation, and backup of data centralized, ensuring the protection of privacy becomes critical.

Sensitive data must be removed from public view to prevent identity theft and fraud attempts from malicious parties. However, businesses holding extensive database facilities with vast amounts of physical data can have a painfully slow and cost-prohibitive manual editing process.

In such cases, Data redaction is a suitable technique to overcome the problem. This article looks at Data Redaction and how it will help you safeguard sensitive customer data.

What Is Data Redaction?

Data redaction is a type of text analysis technique that helps you safeguard sensitive data and control it from getting compromised. You can remove select information from documents to prevent data exposure. This is usually done manually by people in an office. However, if the documents are higher in number, says, 1 million, it becomes extremely excruciating for a person to handle all of it together.

In such cases, advanced analytics techniques such as Named Entity Recognition can automate the complete redaction of data from documents.

The redacted information is a common term for blackening out information. However, it is easier said than done, especially when uploading documents online. One famous example is the debacle by the New South Wales Medical Council in 2016.

The staff at the institution blacked out the person’s name before uploading the document. However, the person’s identity remained in the underlying data linked with the search engine results. Removing information that had already gone out was not easy. The medical council team had to contact Google to fix the issue.

hk

 

How is data redaction different from data masking and anonymization

Data Redaction for Clinical Trial Documents

A leading pharmaceutical and life sciences client wanted to find a solution that can reduce their manual hours of redacting patient information from medical records. Earlier it was taking weeks and months for them to manually redact patient data, which resulted in even more expenses.

With a problem, there always comes a solution, the organization had a need to protect contents such as intellectual property and personally identifiable information in clinical trial documents that are shared with third parties including health authorities and partners.

Anonymization of clinical summary reports is a regulatory requirement for EMA and Health Canada. Regulatory requirements have been growing over recent years and other countries’ health authorities are expected to follow suit leading to an increased demand for reduction and anonymization solutions.

The standard approach was to outsource to vendors the anonymization and redaction of patient personally identifiable information. The third-party vendors were taking longer time, were expensive, and yet not delivering an assessment of the risk of re-identification of data and good accuracy of the documents.

Coming to the solution, a custom platform for redaction and anonymization for them, leveraging NLP and other AI (Artificial Intelligence)/ ML (Machine Learning) technologies.

The relevant Pharma Co. personnel can now cater to requests for clinical trial information from outside quickly and more accurately and with the option for a human to quickly validate the results from the AI/ ML enabled platform.

This has resulted in 97%-time savings in the submission process and is expected to deliver savings of $1million per annum.

Data Redaction Examples

Data redaction examples can be plentiful, depending on the masked information. Let us look at them in detail.

  • Complete Redaction: It involves redacting the entire content in a document. Data with characters can have a single space. If the data has numerical values, it usually gets redacted to zero.
  • Half Redaction: You can redact a small portion of the data in the document. For example, you can edit the last six digits of customers’ mobile numbers. It would be like 7023XXXXXX.
  • Random Redaction: It displays random values to users each time they view a document. The values would depend on the type of underlying information in the record.
  • Regular Expressions: It identifies patterns to redact data. Redacting email addresses that can have varying character lengths is a typical example.

When is Data Redaction Needed?

You may wonder, when and why is data redaction needed? Here are the different scenarios in which you will have to perform data redaction.

Upon Receiving Data

Redacting the data as soon as you receive it helps prevent potential leaks. You can redact all the relevant information from the datasets and reports that you receive. Your redaction process can be automatic or manual, depending on data sensitivity. It is best to check if you have redacted everything correctly before sharing the documents with other stakeholders.

Before Distribution of Data

Individual data in reports and datasets can often remain applicable to only a few stakeholders. In such cases, you can redact data before sharing it with them. For example, the financial information in a document may not be relevant to your marketing team. You can cleanse the data before sharing the record with the marketing team.

Upon Completing Task

After finishing the task, redacting data helps ensure that you have all the necessary information to execute the job successfully. It will also help you avoid redacting essential data that might be critical for the activity. It enables you to reach completion hassle-free while ensuring data security.

Before Data Archiving

Data archives ensure that you have the necessary records to operate your business smoothly and meet compliance norms. Redacting data before archiving it allows us to safeguard information from potential breaches. Automation in archiving enables complete redaction within a short period without leaving any essential information.

Before Data Disposal

You may wonder if it would make sense to redact data from the documents you plan to delete. The scenario is like the ATM withdrawal receipts that you tear before discarding them. The possibility of someone recovering those documents will always be high. It is thus best to redact sensitive information even if you might be deleting those documents for good.

What are the Key Data Redaction Techniques?

Here are the three essential methods of data redaction.

Page Location Redaction

You may deal with standard customer information reports that include everything from their birth dates to credit/debit card details. If the report has a consistent format, it will become easier to redact the sensitive data. However, you will have to safeguard against failed redactions also. In such cases, you will need to make the changes manually, wherever applicable.

Pattern Redaction

If you have a large and complex business, you will likely receive reports in various formats. You may also have to scan your databases to segregate information into types. Matching patterns to identify and redact the data is one of the better ways to manage sensitive information in such an environment. For example, most phone numbers usually have the XXXX-XXX-XXX pattern. Redacting this pattern-based information will be much easier through the pattern redaction method.

Manual Redaction

Automated redaction is preferable, but it may not be possible, especially in situations where there are no recognizable patterns. However, automation will be your best bet wherever possible. Ensure that you follow all the steps involved in the redacting process to avoid costly errors.

Data Redaction Use Cases

Here are the different use cases of Data Redaction across industries:

jkg

Financial Services

Financial companies have to deal with confidential and sensitive customer information overload. They often extract relevant information from the enormous amount of data they work with. AI-enabled tools can help them to filter information through keywords and phrases. Financial firms can use AI-powered solutions to mine relevant information in texts, images, and videos. Some examples of data redaction for financial services are credit/debit card numbers, bank account numbers, mobile numbers, etc.

Pharmaceutical and Lifesciences

Healthcare institutions can end up spending significant time on patient-related paperwork. Redacting sensitive information in minutes will free up the staff to help them focus better on patient care. Whether audio, video, or text files, data redaction can work on all document types. Healthcare institutions can also improve their workflows and enhance productivity while protecting sensitive patient information.

More than healthcare, data redaction is important in clinical trial documents as well. NLP in pharma and life sciences has transformed the manual efforts of clinical experts. Natural Language Processing can help in analyzing medical records in minutes.

Law Enforcement

Law enforcement agencies often race against time to ensure speedy justice for victims. Streamlined workflows help these agencies close cases faster and clear existing backlogs quickly. They can use data redaction to maintain their databases while enabling criminal/victim identification compliance and saving crucial time.

Transportation

Transportation is one of the few industries that is extensively document-heavy. Documents can be from invoices to toll tax receipts and everything in between. Data redaction helps move things swiftly as they should in the transportation industry.

Media and Entertainment

The media and entertainment industry deals with hours of audio and video footage. Whether video editing or dubbing, it can be a tedious task when a large portion of the raw footage needs edits. Data redaction makes it easy for media and entertainment professionals to hide sensitive information in minutes.

Government

Government organizations hold sensitive data of all kinds. They need to adopt all possible safety standards to ensure no data compromises. Data redaction is one of the vital elements in the process that helps them protect sensitive information and pass audits. AI-enabled tools help redact texts and objects with complete ease.

IT & Operations

IT systems are sensitive networks of information that need advanced protection. A minor breach can bring the entire organization’s operations to a halt. Data redaction gives IT professionals the right tools to redact sensitive data and improve their workflows. Automation helps them increase their productivity, allowing them to focus on other essential duties.

Data Redaction vs Data Masking

Data masking is a common term that you may interchangeably use with Data redaction. However, data masking and data redaction have a few differences. Data masking involves replacing accurate information from documents with inaccurate data with the same structure. On the other hand, data redaction only removes sensitive and identifiable information.

Data masking finds extensive use within an organization for testing and training purposes. For example, the IT team would not want identifiable information to get exposed during the testing stage. The types and structure of data remain as it is, which is ideal for future use. On the other hand, data redaction enables concealing personal information that can be easily comprehended. Data redacted for privacy concerns ultimately protects it from falling into the wrong hands.

Benefits of Data Redaction

Data redaction can offer several benefits. Here are some of the essential ones:

Ensures Data Security

You can keep sensitive and identifiable data of your customers secure with data redaction. Safeguarding information has become more critical as data breaches have become common worldwide. Even a minor data breach may impact an organization’s credibility. Investors would be wary of putting in money, while customers would look for secure alternatives.

Improves Data Usability

Data remains at the heart of the operations of any business. Depending on your business type, you may also want to publicly share information with your customers. In such cases, Data redaction helps you protect sensitive data even if you make it public. Your customers will be able to access relevant information while you can still protect sensitive data.

Enables Improved Compliance

Increased data breaches in recent years have forced regulatory agencies to introduce stringent norms to safeguard personal information. Data redaction gives advanced security options ideal for preventing criminal activities such as hacking attempts.

Bottomline

Data redaction has been around for some time now, but it’s still a fairly new technology in terms of implementation. With its unique properties, it has the potential to help protect sensitive data from falling into the hands of unscrupulous individuals. For businesses looking to implement data redaction technology, the first step is determining what kind of application is most suitable for your business.


Note: This blog on data redaction was originally published on blog.gramener.com


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


Gramener is a design-led data science company that solves complex business problems with compelling data stories using insights and a low-code analytics platform.

© Copyright nasscom. All Rights Reserved.