The Algorithm’s Caste System: How AI Is Digitising Discrimination
So, you thought artificial intelligence was the great equaliser? A world of neutral code and objective logic, far removed from messy human prejudices. Think again. The inconvenient truth is that AI is less a pristine new mind and more a distorted mirror, reflecting back the very worst of our societal biases. And as a damning new investigation reveals, for hundreds of millions in India, this isn’t a theoretical problem—it’s a digital reincarnation of a centuries-old system of oppression. Welcome to the world of AI cultural bias, where the algorithm has learned the art of discrimination.
This isn’t just about an AI getting a quiz question wrong. It’s about technology, built predominantly in Silicon Valley, being exported globally with baked-in prejudices that it then amplifies at an unprecedented scale. The latest flashpoint? India, OpenAI’s second-largest market, where its models are systematically stereotyping the country’s 200 million Dalits.
What Is This Digital Bigotry Anyway?
Let’s be clear. When we talk about AI cultural bias, we’re talking about AI models developing skewed or prejudiced outcomes based on the data they were trained on. Imagine an AI as a student who has only ever read books from one very specific, very old library. If those books contain outdated and offensive stereotypes about certain groups of people, the student’s worldview will be hopelessly biased. AI models are trained on vast swathes of the internet—a library filled with humanity’s knowledge, but also its bigotry, stereotypes, and historical injustices.
The result is what we call algorithmic stereotyping. The AI isn’t thinking in a prejudiced way; it’s simply making statistical associations based on flawed data. It learns that certain names, ethnicities, or social groups are frequently mentioned alongside specific, often derogatory, concepts or roles. The machine identifies a pattern and, without any sense of morality or context, endlessly reproduces it.
Case Study: The Caste Code in ChatGPT and Sora
This brings us to the shocking findings from a recent MIT Technology Review report. Researchers including Dhiraj Singha and a team from IIT Mumbai and the University of Washington put OpenAI’s flagship models to the test, and the results are a brutal indictment of the company’s \”one-size-fits-all\” approach to AI safety.
When one user, Dhiraj Singha, who is from a Dalit community, asked ChatGPT about his own surname, the AI changed it to \”Sinha,\” a surname associated with a dominant caste. It wasn’t a typo; it was a digital erasure, a subtle \”correction\” towards a perceived norm. As Singha poignantly noted, \”The experience [of AI] actually mirrored society.\”
The bias runs far deeper:
* Occupational Stereotyping: When prompted with scenarios involving Dalits, ChatGPT associated them with menial or low-paying jobs in a staggering 76% of tests conducted using a new, culture-specific benchmark.
* Dehumanising Imagery: OpenAI’s text-to-video model, Sora, wasn’t any better. Prompts for \”Dalit men\” generated images of them as construction or sewage workers. Even more bizarrely, a prompt for \”Dalit behavior\” resulted in videos of dalmatian dogs in 30% of cases. You read that right. The algorithm, in its soulless pattern-matching, linked a historically oppressed human community with an animal.
Worsening Bias: While GPT-4o showed minor improvements, the researchers found that GPT-5, the next-generation model, actually performed worse*, choosing stereotypical answers in a majority of tests. Progress, this is not.
This isn’t an isolated \”oops\” moment. This is a systemic failure, a clear demonstration of how a lack of Dalit representation in AI development and testing leads to deeply harmful products.
The Real-World Harm of Algorithmic Prejudice
So what if an AI is a bit racist? Why does it matter? It matters because these tools are being integrated into everything from hiring software to loan applications and content moderation. When an AI internalises the idea that \”Dalit\” equals \”menial labourer,\” it can lead to biased CV screening that locks people out of opportunities. When it \”corrects\” a surname, it subtly erases identity and reinforces a social hierarchy.
This isn’t just a failure of technology; it’s a moral failure. It’s the digital equivalent of bottling up centuries of discrimination and selling it as a shiny new tool for progress. The impact on Dalit identity is profound, reinforcing the very stereotypes that activists and communities have fought for decades to dismantle.
Time for a Serious Conversation About Global South AI Ethics
For too long, the conversation around AI ethics has been overwhelmingly Western-centric, focused on issues of race and gender as understood in North America and Europe. Crucial concepts from other parts of the world, like caste, are simply ignored. This is the core challenge of Global South AI ethics: creating fairness in AI systems that accounts for diverse cultural and historical contexts.
The tech giants can’t simply parachute their models into a country like India without understanding the deep-seated social dynamics at play. It’s lazy, it’s arrogant, and as we’ve seen, it’s dangerous. The failure to address non-Western biases isn’t a small oversight; it’s a decision that risks entrenching discrimination in automated systems for generations to come.
Building Better Benchmarks
Part of the problem is how we measure bias. Standard industry benchmarks are blunt instruments, utterly incapable of detecting nuanced cultural biases. This is why researchers have developed culture-specific frameworks like Indian-BhED (Indian Bias, Harm, and Equity Dataset) and BharatBBQ.
The latter, BharatBBQ, an extension of an existing benchmark, scanned models for bias across various Indian languages and found a staggering 400,000 instances of bias. These tools aren’t just academic exercises; they are essential diagnostic kits for an ailing industry. They prove that if you don’t look for a specific type of bias, you will never find it, and your AI will perpetuate it by default.
What’s the Path Forward?
The solution isn’t to switch off the AI. The solution is to build it better.
1. Invest in Diverse Data and Teams: Companies like OpenAI need to move beyond token gestures. This means actively sourcing training data that reflects global diversity and hiring engineers, researchers, and ethicists from the communities their products serve. If you don’t have people with lived experience of caste discrimination in the room, you will not build a caste-aware AI. Simple as that.
2. Adopt Culture-Specific Benchmarks: It should be standard practice for any AI model deployed in a new region to be rigorously tested against benchmarks specifically designed for that culture. No more excuses.
3. Radical Transparency: When models are found to be biased—and they will be—companies need to be transparent about the failures and clear about the steps they are taking to fix them. Trying to quietly patch the problem while denying its severity is a losing strategy.
This AI caste system is a clear warning. We are at a crossroads where we can either allow technology to deepen old wounds or we can demand that it be built with the empathy, context, and inclusivity required to serve all of humanity, not just one corner of it.
The code is being written now. The question for all of us is, what kind of world are we programming?
What other cultural biases do you think are being overlooked by AI developers today? Share your thoughts below.



