How Language Carries Gender Bias Into Algorithms, Perpetuates Status Quo

A structurally sexist society, big data and gendered data gaps walk into a bar. And into the office, the train station, your home, the doctor’s office, the roads and the movies. The list does not end here, but this writer’s capacity to mention all the spaces where gender biases infiltrate our lives does.

In a society that is heavily becoming dependent on big data, artificial intelligence and machine learning systems, it is crucial to ensure that this data comes free of any cultural or human bias. Unfortunately, this is hardly ever the case. As put by Perez (2019):

[…] A world increasingly reliant on and in thrall to data. Big Data. Which in turn is panned for Big Truths by Big Algorithms, using Big Computers. But when your big data is corrupted by big silences, the truths you get are half-truths, at best. And often for women, they aren’t true at all (p. XII).

Language has been the cornerstone of societies, and in the modern world, one of the more commonly used languages ​​is that of algorithms. They permeate our lives in different ways – from what to watch next to bus schedules; algorithms have increasingly started to get employed, ostensibly to ‘improve’ the lives of modern people. However, algorithms are based on data, and there currently exists a big data gap – a gendered one (Perez, 2019). On some days, I worry that this gendered data gap is big enough to swallow us whole. This big data gap isn’t usually intentional or evil, but it exists nonetheless. It comes from different sources – from a common understanding of males as the default to the way we think, how we talk and how we arrange our communities – among others. With flawed fundamentals, it is improbable that we will arrive at equitable social contexts.

Gender bias in algorithms means men get higher credit card limits from financial institutions, their CVs and resumes get shortlisted for job applications, and they might even get a preference for vaccinations (Smith and Rustagi, 2021). It means that everything we access on the internet comes with a good twist of sexism and prejudice. For instance, a study done by Wikipedia, the world’s largest online encyclopedia, found that a mere nine per cent of its editors were women.

This lopsided view of the world which assumes males as the universal default and tends to sideline women comes at a cost. In India, Wikipedia pages are divided into ‘list of Indian writers’ and ‘list of Indian women writers’. Several Wikipedia pages of women in STEM are layered with details about their intimate lives, their husbands’ glories and trivial information that does not find itself on the pages of their male counterparts. As Simone de Beauvoir once said, “representation of the world, like the world itself, is the work of men; they describe it from their own point of view, which they confuse with the absolute truth. ”

With a world constantly thrusting itself into technology and its glories, it becomes essential for us to take a step back and investigate who creates this technology, who has access to it and who gets affected by it. As per Smith and Rustagi, this bias in data and the significant gendered data gap, come partially from the digital gender divide (2021). It also comes from the lack of adequate representation of women in spaces where these algorithms are developed – women make up a mere 22 per cent of professionals in data science and artificial intelligence arenas. Women’s absence from these fields is often perceived as ‘an objective reflection of merit’ and not as a case of underlying biases and unfair structures.

It becomes essential for us to take a step back and investigate who creates this technology.
With a world constantly thrusting itself into technology and its glories, it becomes essential for us to take a step back and investigate who creates this technology, who has access to it and who gets affected by it. Shutterstock

Google Translate and the Great Gendering

Let us take the example of Google Translate. A helpful tool on most days, it works on a basic algorithm that uses the observed frequency of usage to create translations. Simply put, it runs on available information. Unfortunately, this information is significantly biased. For example, languages ​​such as Turkish do not have any gender pronouns. There is no “he” or “she”, just “o”. But when you translate “o bir asci” (they are a cook) to English, it gets translated to “she is a cook”, while phrases with “engineer” and “doctor” are turned to “he is an engineer / doctor” .

This seemingly simple algorithm not only genders the gender-neutral Turkish sentences but also adds sexism into the mix. Due to the frequency of usage – most of the internet uses masculine pronouns for words like “astronaut or doctor” and feminine ones for “cook and nurses”.

This is a matter of grave concern since it is essential for us to acknowledge the male bias that subtly defines and shapes our interactions with the world. Women are less likely to apply for jobs that are considered to be for men. This consideration and classification come from different places. Our language is one such place. As said by Nayantara Dutta, “in many ways, language both reflects and creates the gender inequalities that exist in society.”

However, it is also important to note that there is hope

Once this gender inequity was pointed out, Google made the necessary modifications.
. Once this gender inequity was pointed out, Google made the necessary modifications and rolled out an updated, more ‘equal’ version.

This new version offered a more nuanced understanding of gender differences. Nestled in a corner of the world wide web, this victory went unnoticed and uncared for by many. But it is a victory, nonetheless. It supports the argument that algorithms are not sexist; people are. It also means our languages ​​- scientific languages, algorithms and codes, in particular – can be altered, modified, and developed to account for the gaping gender gap.

What can be done?

Given the nature of the copyrighted machine learning systems, the processes employed by algorithms are often opaque. They do not give their users any insight into the decision-making process. A more transparent (or translucent) approach could help users point out problems and help fix bugs.
Research suggests that algorithmic bias can be reduced if a diverse demographic group develops it. Enhancing access to STEM fields, reducing barriers, improving diversity and including women from different castes, economic and social locations can enable algorithms to create a fairer world. The inclusion of feminist data and audits for these algorithms that consider the female perspective can also help fix biased algorithms. As mentioned earlier, we must remember that the scientific language is sexist, but it can also be edited and modified to serve the needs of society as a whole and not just the men in it.


Cowgill, B., Dell’Acqua, F., Deng, S., Hsu, D., Verma, N., & Chaintreau, A. (2020, July). Biased programmers? Or biased data? A field experiment in operationalizing AI ethics. In Proceedings of the 21st ACM Conference on Economics and Computation (pp. 679-681).

Dutta, N. (nd). The subtle ways language shapes us. Retrieved from

Khanna, A. (2012, April 27). Nine out of ten Wikipedians continue to be men: Editor survey. Diff. Retrieved from

Kuczmarski, J. (2018, December 06). Reducing gender bias in Google Translate. Retrieved from

Martin, E. (1991). The egg and the sperm: How science has constructed a romance based on stereotypical male-female roles. Signs: Journal of Women in Culture and Society, 16 (3), 485-501.

Perez, CC (2019). Invisible women: Exposing data bias in a world designed for men. Random House.
Young, E., Wajcman, J. and Sprejer, L. (2021). Where are the Women? Mapping the Gender Job Gap in AI. Policy Briefing: Full Report. The Alan Turing Institute.


Leave a Comment

Your email address will not be published.