Human biases are well-documented, from tacit association tests that reveal biases we may not even be aware of, to field experiments that demonstrate how much these biases can affect outcomes. In the recent past, society has started to struggle with just how much these human biases can make their way into artificial intelligence (AI) systems, often with negative impacts. At a time when many companies are looking to deploy AI systems across their operations, being acutely aware of those risks and working to reduce them is an urgent priority. Bias is something that we cannot avoid in life. But social bias can be manifested and magnified by artificial intelligence in perilous ways, whether it be in deciding who gets a bank loan or who is kept under surveillance.
Machine learning can also impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unjustly biased against certain subpopulations, for example, those of a particular race, gender, religion, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid propagating or creating inequitable practices.
Algorithmic bias describes systematic and repeatable errors in a computer system that create unfair outcomes, such as privileging one arbitrary group of users over others[i]. Bias can emerge due to many factors, including but not limited to the design of the algorithm or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected, or used to train the algorithm. Algorithmic bias is found across platforms, including but not limited to search engine results and social media platforms, and can have impacts ranging from inadvertent privacy violations to reinforcing social biases of race, gender, sexuality, and ethnicity. The study of algorithmic bias is most concerned with algorithms that reflect “systematic and unfair” discrimination.
An example of unfairness are cases in which an algorithm is holding onto something that is irrelevant and could potentially give you very mediocre results. For example, imagine that you are trying to predict tuberculosis from X-ray images in data from multiple hospitals in various locations. The algorithm will learn to recognize which hospital generated the image if you are not paying attention. Some X-ray machines have different characteristics in the image they produce than other machines, and some hospitals from some specific locations have a much larger percentage of tuberculosis cases than others. And so, you could learn to predict the disease well on the data set that you were given simply by recognizing which hospital and which location did the X-ray come from, without ever looking at the patient. The algorithm is doing something that appears to be good but is doing it for the wrong reasons.
AI algorithms are only as good as the data we use to train them, and bias can creep into algorithms in many ways. Bad data can result in gender, racial, political, or ideological biases. This is a perennial issue since AI systems continue to be trained using bad historical data. AI systems that use training data from biased human actions and decisions can also contribute to system bias. Another issue is with the defective data sampling, this could result in groups becoming over or underrepresented in the training data. As humans and AI increasingly work together to make decisions, it is critical to develop and train these systems with data that is unbiased and to develop algorithms that can be easily explained. Enterprises such as Amazon, Google, Microsoft etc., are looking at ways to ensure human bias does not affect the data or algorithms used to make those decisions or predict outcomes.
It is essential to accelerate the progress that we have been making in addressing bias in AI. At the outset, an accurate definition of fairness is needed for us to understand and measure fairness accurately. Identifying appropriate fairness criteria for a system requires accounting for user experience, cultural, social, historical, political, legal, religious, and ethical considerations. This incidentally is one of the most complex steps in this journey. Scientists have developed technical ways of defining fairness, such that the AI models have similar predictive value across groups or similar false positive and false negative rates across various groups.
A globally acceptable definition for fairness is yet to reach, and it is still evolving. Researchers and scientists have made a wide variety of methods and techniques to ensure fairness in AI systems, by incorporating the fairness definition during the training process of the AI system, by processing data beforehand, by changing the sampling of data, by changing the AI system’s decisions subsequently. One encouraging practice is “counterfactual fairness”, which guarantees that a model’s decisions are the same in a counterfactual world where attributes deemed sensitive, such as race, gender, religion or sexual orientation, were changed.
There are tools and best practices made available by many companies leading in AI to tackle this issue. Some of them are described below.
Fairlearn from Microsoft[ii]
Fairlearn, is an open-source toolkit that allows data scientists and developers to gauge and enhance the fairness in AI systems. It has two components: an interactive visualization dashboard and unfairness mitigation algorithms. These components are designed to help with navigating trade-offs between fairness and model performance. Microsoft emphasizes that prioritizing fairness in AI systems is a socio-technical challenge.
As Fairlearn grows as an open-source toolkit to include additional fairness metrics, unfairness mitigation algorithms, and visualization capabilities, it is expected that it will be shaped by a diverse community of stakeholders, ranging from data scientists, developers, and business decision-makers to the people whose lives may be affected by the predictions of AI systems.
AI Fairness 360 from IBM[iii]
IBM recently announced that it will no longer offer facial recognition or surveillance technology due to concerns over bias. IBM scientists also devised an independent bias rating system that can determine the fairness of an AI system.
This extensible open-source toolkit can help you examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle.
SageMaker Clarify from Amazon[iv]
Amazon SageMaker Clarify is a fully managed machine learning service that helps data scientists and developers to quickly build and train machine learning models, and then directly deploy them into production.
It provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. It also detects potential bias during data preparation, after model training, and in the deployed model by examining attributes specified. SageMaker Clarify is integrated with Amazon SageMaker Data Wrangler, making it easier to identify bias during data preparation. It is also possible to check the trained model for bias, such as predictions that produce a negative result more frequently for one group than they do for another. SageMaker Clarify is integrated with SageMaker Experiments so that after a model has been trained, you can identify attributes you would like to check for bias, such as age.
Changes in real-world data can cause one’s model to give different weights to model inputs, changing its behaviour over time. For example, a decline in home prices could cause a model to weigh income less heavily when making loan predictions. Amazon SageMaker Clarify is integrated with SageMaker Model Monitor to alert you if the importance of model inputs shifts, causing model behaviour to change.
Responsible AI Practices from Google[v]
Google had made it very clear that it is committed to making progress in the responsible development of AI and to sharing knowledge, research, tools, datasets, and other resources with the larger community. Google has shared some of its best practices and recommended approaches while developing AI / ML systems.
- Use a human-centered design approach
- Identify multiple metrics to assess training and monitoring
- When possible, directly examine the raw data
- Understand the limitations of the dataset and model
- Test, Test, Test
- Continue to monitor and update the system after deployment
Conclusion
Organizations must establish a well-tested and responsible process that can alleviate bias. The responsibility starts with business leaders in the enterprise who shall be aware of the fairness issues in AI systems, at the same time, must stay up to date on this fast-moving field of research. Addressing the human biases in decision making by revisiting how the decision was made by a human and explain the system by running it alongside with the human decision-makers is an important step in this process. Also enabling humans and machines to work together to mitigate the bias by some HITL (human in the loop) systems which make recommendations and suggestions that humans accept or reject or modify.
Classifying and diminishing bias in AI systems is vital to building trust between humans and systems that learn. AI systems can uncover human contradictions in decision making, can also expose ways in which we are partial and cognitively biased which could eventually lead us to adopt more impartial or democratic views in our day-to-day life.
In the process of recognizing our bias and teaching machines about our common values, humans may improve more than AI. Far from a solved problem, fairness in AI presents both an opportunity and a challenge.
Invest more, provide more data, and take a multi-disciplinary methodology in bias research to continue progressing the field of AI. Allocate more in diversifying the AI field itself. A more diverse AI community would be better prepared to anticipate, assess, and spot bias and would strive to involve communities which are affected by this unfairness in AI systems.
[i] https://en.wikipedia.org/wiki/Algorithmic_bias
[ii] https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/
[iii] https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
[iv] https://aws.amazon.com/sagemaker/clarify/
[v] https://ai.google/responsibilities/responsible-ai-practices/