Tests to catch bias in the AI tools you use every day. – Psychology Today

Home AI Tests to catch bias in the AI tools you use every day. – Psychology Today
Tests to catch bias in the AI tools you use every day. – Psychology Today

The best way to begin something new—in love, work, and life.
Self Tests are all about you. Are you outgoing or introverted? Are you a narcissist? Does perfectionism hold you back? Find out the answers to these questions and more with Psychology Today.
Posted | Reviewed by Davia Sills
Imagine you’re polishing your resume for a research job. You ask ChatGPT for feedback from a hiring manager’s perspective. Say your experience also includes membership in a disability advocacy organization and a few awards for your advocacy work.
Would you question whether ChatGPT’s feedback would be the same for an identical resume without the disability advocacy?
We don’t have to wonder about this. In a 2024 experiment,1 researchers asked ChatGPT to compare two resumes: one with a leadership award for autism advocacy and another without this award, but identical in all other aspects. ChatGPT concluded that the first resume showed “less emphasis on leadership roles in projects and grant applications” than the second.
ChatGPT ranked a candidate with more leadership experience lower than a candidate with less leadership experience due to the presence of the term “autism.” Unfortunately, this pattern appears across other identity categories as well, and understanding why is the first step to mitigating this bias.
Racial and gender biases in non-generative artificial intelligence tools, like facial recognition software, have been well documented.2 Generative AI is no different in terms of bias propagation.
Studies have found evidence for biased output based on race, sex, disability status, psychiatric diagnosis, and dialect. 1/3/4/5 This bias appears even when AI is not given explicit information about race or gender.4
In one experiment, the AI bot was given two transcripts conveying the same meaning, one in African American Vernacular English (AAVE) and the other in Standard American English (SAE). When asked to assess the employability of the speakers, the bot assigned more prestigious jobs, such as lawyer and psychologist, to speakers of SAE, and less prestigious jobs, such as cook and guard, to speakers of AAVE.5
In another example, Bloomberg conducted an in-depth analysis of bias in a visual AI tool called Stable Diffusion. They asked the AI bot to generate human faces that represent various occupations and found that “men with lighter skin tones represented the majority of subjects in every high-paying job, including ‘politician,’ ‘lawyer,’ ‘judge,’ and ‘CEO.’”
The explanation for these biases is that generative AI draws on available input to generate responses. Therefore, the validity of an AI bot’s output is only as good as the data it draws from. For large language models (LLMs) like Claude, Gemini, and ChatGPT, that “data” skews White and male; pre-existing stereotypes and biases are reinforced and, in some cases, amplified.3
The potential harm of bias in generative AI tools is largely due to widespread adoption by organizations and free or low-cost access for end users. To be clear, not all cases of generative AI are equally vulnerable to bias or equally influential on the impact scale.
For instance, I regularly use Claude to adapt recipes to the ingredients in my fridge or the tools in my kitchen, and I’m not concerned about harmful social stereotypes in the output. Similarly, generative AI serves many non-evaluative functions, and in those applications, the potential for harm is reduced since the bot is not informing decisions that directly impact people’s employment.
However, as the use of generative AI in high-stakes situations increases rapidly, such as in application screening, performance evaluations, and pre-employment assessments, the potential harms increase proportionally. Furthermore, research shows that when an AI bot evaluates data from a single source possessing multiple marginalized identities, the bias doesn’t just add up; it multiplies.6 People with intersectional identities may be particularly vulnerable to the negative impact of unchecked AI bias.
Researchers and experts have identified numerous ways for tech companies and corporations to mitigate bias throughout all phases of AI design and implementation.7 However, there is less guidance for the average end user who is accessing a readily available tool like ChatGPT or Claude.
While there is no single test to definitively assess bias in an AI bot, in the same way that no such test exists for human bias, some healthy skepticism and strategic prompting can help you assess the extent to which a tool is biased and correct it accordingly.
Here are three tests anyone can use to evaluate potential bias when using generative AI.
The Substitution Test
If you’re using AI in any evaluative function where some aspect of a person’s identity is made known to the bot, run the same prompt twice, changing only a name or demographic marker (Rakesh versus Robert; “Black woman CEO” versus “CEO”). Most AI bots will produce different responses based on these changes, so comparing the output can uncover bias. Common signs of bias in AI output include the language used to describe the person or their qualifications, predictions about a person’s potential for success, and illogical explanations for observed differences in output. By running the same prompt with different demographic markers, these types of differences may become more apparent.
The Projection Test
If possible, provide de-identified resumes, proposals, or writing to the AI, and after it generates output, ask it to explain any assumptions it made about the person’s identity. It’s important to be specific in your prompting. “What gender and race did you assume for this person?” is a more effective prompt than “What did you assume about this person?” Since generative AI analyzes language, specificity improves the likelihood that a response will be based on relevant data. This is the most reliable test for evaluative tasks because it is designed to first assess the bot’s default response and then allow you to prompt for different outputs based on specific parameters.
The Priming Test
This test is based on the autism advocacy study from the start of this post and recommendations from AI experts such as Ethan Mollick. The researchers found that by explicitly instructing the bot to be “less ableist” and “more cognizant of disability justice,” the output was fairer.1 To apply this test, explicitly instruct the AI bot what values, assumptions, and principles you want it to use in responding to your prompt.
This test aims to pre-empt AI bias, but this doesn’t necessarily mean the model will provide you with a more balanced response. It may just tell you what it thinks you want to hear, as AI models are prone to do. Even AI optimists acknowledge this—Dr. Mollick has noted that while telling an AI bot to be unbiased measurably reduces bias, it’s not a reliable intervention, particularly in high-stakes situations.
As AI technology evolves, our tests will need to evolve as well, which leads to a bigger question: Will this problem always be a problem?
There are many ways to improve AI technology at the individual, organizational, and societal levels, but governments have been slow to enact regulation, especially in the United States. Regardless, there is an element of collective responsibility in how we recognize and manage bias in AI. In the same way that social media users and developers share responsibility for how tools are deployed and presented, we all play a part in the development and deployment of AI.
Two barriers at the individual level must be surmounted for the end user to share in this responsibility. The first is the automation bias—the tendency to assume that a bot is providing more “accurate” and “objective” information, making us less inclined to question or check the output. We can counteract this by maintaining a skeptical attitude towards AI output and adjusting our expectations to be more realistic regarding its limitations.
The second barrier is the extra time it takes to implement the tests I described. Many of us use AI to save time, and so checking its work can seem counterproductive if efficiency is the incentive behind using the tool. Nonetheless, if we fail to check AI’s output, the result is that we outsource our critical thinking, a skill that erodes if not exercised.
AI is a tool that can be used for great benefit or for great harm. If left to its own devices, generative AI will likely just re-enact our existing biases, unless we can pause to critically evaluate what it reflects to us.
References
1. Glazko, K., Mohammed, Y., Kosa, B., Potluri, V., & Mankoff, J. (2024). Identifying and improving disability bias in GPT-based resume screening. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 687–700. https://doi.org/10.1145/3630106.3658933
2. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.
3. Wilson, K., & Caliskan, A. (2024). Gender, race, and intersectional bias in resume screening via language model retrieval. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7, 1578–1590. https://doi.org/10.1609/aies.v7i1.31748
4. Bouguettaya, A., Stuart, E. M., & Aboujaoude, E. (2025). Racial bias in AI-mediated psychiatric diagnosis and treatment: A qualitative comparison of four large language models. npj Digital Medicine, 8, Article 332. https://doi.org/10.1038/s41746-025-01746-4
5. Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). AI generates covertly racist decisions about people based on their dialect. Nature, 633(8028), 147–154. https://doi.org/10.1038/s41586-024-07856-5
6. Yan, Y., Zhu, Y., & Xu, W. (2025). Bias in decision-making for AI’s ethical dilemmas: A comparative study of ChatGPT and Claude [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2501.10484
7. Wei, X., Kumar, N., & Zhang, H. (2025). Addressing bias in generative AI: Challenges and research opportunities in information management. Information & Management, 62(2), Article 104103. https://doi.org/10.1016/j.im.2025.104103
The Algorithmic Justice League: https://www.ajl.org/
Ethan Mollick Substack: https://www.oneusefulthing.org/
The Montreal AI Ethics Institute: https://montrealethics.ai/blog/
Share this post

There was a problem adding your email address. Please try again.
By submitting your information you agree to the Psychology Today Terms & Conditions and Privacy Policy
Natasha Thapar-Olmos, Ph.D., is an associate professor at the Graduate School of Education and Psychology at Pepperdine University.
Get the help you need from a therapist near you–a FREE service from Psychology Today.
Psychology Today © 2026 Sussex Publishers, LLC
The best way to begin something new—in love, work, and life.
Self Tests are all about you. Are you outgoing or introverted? Are you a narcissist? Does perfectionism hold you back? Find out the answers to these questions and more with Psychology Today.

source

Leave a Reply

Your email address will not be published.