Data is fuel for AI: it either makes or breaks the model. So, if you want your data to reflect your objectives as completely as possible, you should first beware of the hazards a bias may pose for your data, algorithm, or model. Just like humans can be affected by cognitive biases as a result of a systematic error in thinking or faulty inclination in favor of or against a certain idea, models too can operate based on a certain type of machine learning bias, which we’ll further discuss in this article. More specifically, we’ll discuss:
- Cracking bias: Why it matters
- Types and examples
- How to measure bias: Suggestions by SuperAnnotate
- Mitigating bias in machine learning
- Final thoughts
Cracking bias: Why it matters
AI applications can be extended to a multitude of sensitive environments attempting to make improvements in different aspects of life. Provided that, it becomes crucial to ensure that the way AI impacts our day-to-day activity is not discriminative towards a certain idea, group of people, or circumstance. Besides, with the escalating stress on the commercial side of AI, being aware of the types of biases in AI, how they can affect the model performance, and knowing how to measure and reduce bias can cut you slack in the long run.
Types and examples
Most AI systems are data-driven and require loads of data to be trained on. Hypothetically, if your training data contains bias, the algorithms will learn them and reflect those biases in the prediction. In some cases, algorithms can even magnify biases and mislead you in the outcome, which is why they better be avoided. As in the Survey on Bias and Fairness in Machine Learning by the USC Information Science Institute, we’ll break the types of biases in the machine learning space down into three major categories:
Data to algorithm bias
Here, we’ll list several types of biases in data that lead to biased algorithmic results:
- Measurement bias: There is a difference in how we assess and measure certain features vs. how we draw conclusions from observed patterns, which must be considered to avoid measurement bias. This type of bias appears when uneven conclusions are reported during the construction of the training dataset. As an example, we cannot assume that minority group representatives are more likely to commit a heinous crime just because they have higher arrest rates: there is a difference in how these groups are assessed vs. how one may perceive them.
- Sampling bias: Also known as selection bias, sampling bias occurs when the training data is not sampled randomly enough from the collected data, creating a preference towards some populations. Consider a large and rich dataset of photographs of humans of all ethnicities that does not have any bias towards any ethnicity in particular. If a specific face recognition system is trained largely on those that are photographs of white men, it won’t perform well when identifying women and people of diverse ethnicities, even if the collected data was not originally biased. To avoid this kind of bias, the training data must be sampled as randomly as possible from the data collected.
- Representation bias: Similar to sampling bias, representation bias derives from uneven data collection. More specifically, it arises when the process of collecting data does not consider outliers, the diversity of the population, and anomalies. Consider the same face recognition system in the sampling bias. If the collected data contains mostly photographs of white men, then random sampling will not help avoid bias, as bias is already inherent to the collected data. This is an example of representation bias.
- Aggregation bias: Aggregation bias is observed when false assumptions or generalizations are made about individuals when observing the whole population. It is crucial that the chosen set of possible labels tagging the training dataset captures the different conclusions that one may draw from the dataset. As a matter of example, consider a dataset composed of pictures of cats, dogs, and tigers, where a model is being trained to predict the weight of the animal in the image. Labeling these images as either “dogs” or “felines” may be misleading since tigers and cats have different weights. It is important that the conclusions (weight) are well captured by the label’s definition.
- Omitted variable bias: Omitted variable reflects the bias of one or more missing variables that may impact the end result one way or another. Eventually, your model ends up attributing the effects of the missing variables to the included ones.
Algorithm to user bias
Algorithms exert an influence on user behavior. In this section, we will focus on the types of algorithmic biases that can eventually act on user behavior:
- Algorithmic bias: As simple as that, here, bias is introduced by the algorithm, not data, and by the choices you make when optimizing particular functions (e.g., the depth of a neural network, the number of neurons per layer, or regularization). Bias can even be initiated by the prior information that the algorithm requires, as most AI algorithms need some degree of prior information to operate.
- Popularity bias: Popular items are more often exposed, which is just as truthful as them being subject to manipulation (by spam, fake reviews, blackhat SEO practices in search engines, etc.). Even if the model is making the right predictions, the final conclusion may be biased due to the popularity of other possible conclusions. Similar popularity may not be a sign of quality, but biased approaches instead, often not observed on the surface level.
- Emergent bias: This type of bias occurs over time as a result of interaction with users and can be triggered by changes in the target user base, their habits, and values, usually after the model is designed and deployed.
- Evaluation bias: Arising during model evaluation, evaluation bias can be a result of ill-suited or disproportionate benchmarks, in facial recognition systems, for example, that come off biased towards skin color and gender. Not only is it important to construct unbiased training datasets, but also to design bias-free test datasets and impartial benchmarks.
User to data bias
Since a lot of data introduced to the models is user-generated, inherent user biases can be reflected in the training data. Here are some examples:
- Population bias: When user demographics, statistics, and data, in general, differs in the platform you’re extracting data from (social media, for instance) vs. the original target population, you’re dealing with population bias. Short and sweet — it’s non-representative data that’s messing with your model outcomes.
- Social bias: Imagine a situation when you’re expected to rate a service, you have a score in your head, but reading reviews from others and being exposed to the majority of opinions, you suddenly change your mind. Then the review you’ve just left, supposedly impartial, is used to train a model, but what happened instead is others’ actions impacted your judgment: that is user-to-data social bias.
- Behavioral bias: Users react differently when exposed to the same information, which is how behavioral bias occurs. Again, taking social media, one emoji may represent totally different ideas for people from different cultures, leading to completely contrasting communication directions, and that in turn can be reflected across your dataset, assuming those messages are your data.
How to measure bias: Suggestions by SuperAnnotate
There are various metrics to measure bias, and the key points differ depending on the goal of our project or the types of tasks you need to accomplish. For classification tasks, you focus on the accuracy of predictions. When working with location-based annotations with bounding boxes or polygons, it’s more about the intersection of the units and overlap. Measuring bias of supervised ML projects indeed goes beyond that, so, here are a few tips for you:
Track annotation activity per user
Make sure you can view each annotator’s progress – this way, you can notice inaccurate labeling on time, identify the source of the error and prevent forthcoming bias expansion. This is especially useful when outsourcing annotation services, which naturally means handing over more control to a third party. Besides, tracking annotator’s activities is absolutely necessary for large-scale labeling projects.
Identify systematic-error sources, locations, and reasons
Besides tracking the individual progress of annotators, it’s valuable to get a bird’s eye view of the annotations overall and be able to filter out the necessary data. For example, you may want to see annotations for a particular data point, class, or attribute. This way, you can identify the locations and sources of errors and work them out. Other reasons for bias may include the following:
- Inefficient instructions with little to no examples
- Lack of communication across the team
- Time of the day (annotations made later in the day tend to be more accurate because of lighting)
Analyze your dataset, take the time to consider possible reasons behind bias occurrence, and think of a strategic approach to solve existing and prevent upcoming errors.
Mitigating bias in machine learning
Indeed, bias can creep into a model due to a number of factors: poor data quality, model performance mismatch, the human factor, etc. In this section, we’ll introduce a few steps you can take while developing a machine learning model to minimize the risk of bias:
The right training data
Make sure your dataset is diverse, inclusive, balanced enough, and represents your objectives as completely as possible. As you saw, the data collection method can also introduce bias. Make sure your data covers the cases that address the environment your model will operate in. If you’re dealing with public datasets, be extra-cautious and try not to reuse them to avoid bias.
Infrastructure-related issues
Problems with equipment can also present bias in cases when you rely on data collected from electronic devices, smartphones, cameras, etc. In fact, this one’s the hardest type of bias to detect, but investing in the right infrastructure can benefit your model more than you know.
Deployment and feedback
One of the bias categories discussed earlier covers algorithmic biases that influence user behavior. To spot that on time and make sure the model operates the way you intend it to, always keep feedback in mind when deploying. Provide space for your end-users to connect with you and share their thoughts on how the model performs.
Final thoughts
In this article, we focused on machine learning bias, answering some of the pivotal questions around the topic, from types to measurement and prevention methods. We hope you use the tips discussed to achieve the desired level of accuracy for your model. Also, if you’re in charge of your own training data and do not rely on public datasets, make sure to download our checklist on how to build superdata for more insights.