Racial and ethnic bias in machine translation and AI-generated texts

It has been reported at the beginning of 2023 that the tools such as ChatGPT have reached over 100 million active users across the world (Hu 2023). The main problem with this subject is the widespread misunderstanding of how they work and how the responses provided by chatbots are generated. We often forget that the answers ChatGPT provides us are a patchwork of what has previously been published. This makes it all the more interesting if we look at them in terms of topics that might seem problematic or taboo. The presentation, therefore, depicts how ChatGPT is biased by prejudices and stereotypes, especially in terms of ethnicity and race. Responses in English, German, French, Polish, and Lithuanian are presented in order to compare the effectiveness of debiasing the tool during its training among different languages.

The direct cause of the extraordinary accuracy of AI tools, which translates into their popularity, is the countless amount of training data – it ensures high efficiency and a relatively low error rate in translations and generated texts. The downside is that there is no control over what data is used for the training, so most of the data is of negligible quality (Salinas Burbat, 2023). Additionally, this data is outdated because older data is much easier to access (Urchs et al. 2023) or leads to different minorities being underrepresented (Zou / Schiebinger 2018). Consequently, errors and negative biases occur in the returned data. The state of the art is prosperous when exploring gender bias not only in AI chatbots but also in automatic text translation tools based on machine learning. Automatic translators not only convert pronouns between languages from feminine to masculine (Zou / Schiebinger 2018) but also fail to use feminine lexical forms, even if the feminine pronoun is used correctly in a sentence (Vanmassenhove et al. 2019; Rescigno / Monti, 2023), or are really prone to reinforce the use of stereotypical verbs (Troles / Schmid, 2021). Thus, syntactic errors are generated, in which we might observe a lack of agreement between the subject and the verb and, simply put, gender bias. The problem also applies to tools such as ChatGPT (Gross 2023), which are at the height of their popularity (Urchs et al. 2023). Many proposals have also been made to optimize the bias (Williams 2023) and discuss when debiasing should occur (Fleisig / Fellbaum 2022; Tomalin et al. 2021).

The proposed study presents the results of text analysis – short fictional stories about members of selected communities representing ethnic minorities. The generated texts are also assessed for sentiment analysis. Five foreign languages were selected for the study – English, German, French, Polish and Lithuanian. The most striking example that inspired the described study is a sentence translated from Polish to German by both the DeepL and Google Translate. For example, ‘Apaches eat chips’ sentence was translated as 'Apachen fressen Chips'. The verb 'fressen' can only be used in German in the context of animals, and its use in the case of humans is highly offensive.

The author asked ChatGPT to generate 100 sentences about basic activities performed by minority groups (ethnic, national, racial). These sentences have also been translated into the other languages mentioned in this text. From the generated list, ten minority groups were then selected about, which ChatGPT was prompted to generate a fictional story 1  . All stories and sentences are written politically correctly, but the topics covered by ChatGPT are very stereotypical. If one asks it to generate a text with the name of a community that is considered offensive (e.g., Gypsies), information about the inappropriateness of the proposed expression is displayed. The name "Eskimo" is automatically replaced with "Inuit", which is interesting because it is such a complex minority group that the use of the word "Inuit" is all the more offensive. As a matter of fact, "Eskimo" is not at all an inappropriate expression (Beneviste, 1953), just as there are not over 50 names for snow in Eskimo-Aleut languages (Kaplan, 2003). By trying to avoid offensive information, ChatGPT only reinforces more harmful stereotypes. It is also interesting that the attention to the "appropriate" use of the names of minority groups is somehow reserved for the English language. For example, in Polish, ChatGPT eagerly generates a story about "Cyganie" (direct, offensive translation of ‘Gypsies’), without proposing to recount the Romani people. The presentation, therefore, confronts harmful stereotypes hidden in automatically generated texts in a contrasting approach between five languages.

Appendix A

Bibliography
  1. Benveniste, Émile (1953): “The 'Eskimo' Name”, in: International Journal of American Linguistics 19, 242-245.
  2. Fleisig, Eve / Fellbaum, Christiane (2022): Mitigating Gender Bias in Machine Translation through Adversarial Learning. <arXiv:2203.10675> [10.12.2023].
  3. Gross, Nicole (2023): “What ChatGPT Tells Us about Gender: A Cautionary Tale about Performativity and Gender Biases in AI”, in: AI. Social Sciences 12, 435.
  4. Hu, Krystal (2023): “ChatGPT Sets Record for Fastest-Growing User Base”, in: Reuters. <https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01> [10.12.2023].
  5. Kaplan, Lawrence D. (2003): “Inuit snow terms: How many and what does it mean?”, in: Proceedings of the second Ipssas seminar, Iqualuit, 263-269.
  6. Rescigno, Argentina Anna / Monti, Johanna (2023): “Gender Bias in Machine Translation: a statistical evaluation of Google Translate and DeepL for English, Italian and German”, in: Proceedings of the International Conference HiT-IT 2, 1-11.
  7. Salinas, María-José Varela / Burbat, Ruth (2023): “Google Translate and DeepL: breaking taboos in translator training. Observational study and analysis”, in: Ibérica 45, 243-266.
  8. Tomalin, Marcus, et al. (2021): “The practical ethics of bias reduction in machine translation: why domain adaptation is better than data debiasing”, in: Ethics and Information Technology 23, 419-433.
  9. Troles, Jonas-Dario, / Schmid, Ute (2021): “Extending Challenge Sets to Uncover Gender Bias in Machine Translation. Impact of Stereotypical Verbs and Adjectives”, in: Proceedings of the Sixth Conference on Machine Translation: Association for Computational Linguistics, 531-541.
  10. Urchs, Stefanie, et al. (2023): “How Prevalent is Gender Bias in ChatGPT? – Exploring German and English ChatGPT Responses”, in: 1st Workshop on Biased Data in Conversational Agents. <arXiv:2310.03031> [10.12.2023].
  11. Vanmassenhove, Eva et al. (2019): “Getting Gender Right in Neural Machine Translation”, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3003-3008.
  12. Williams, Damien (2023): “Bias Optmiziers”, in: American Scientist: Research Triangle Park 111, 204-207.
  13. Zou, James / Schiebinger, Londa (2018): “Design AI so that it’s fair”, in: Nature 559, 324-326.
Aleksandra Rykowska (aleksandra.rykowska@student.uj.edu.pl), Jagiellonian University in Kraków, Poland