Building Bridges or Walls? Topic Modeling for Analyzing Trust in Indirect Literary Translation

This study explores BERTopic modeling to analyze trust and fidelity in indirect translation. Indirect translation, often referred to as second-hand translation, involves using intermediary languages due to restricted access to the original source language. This reliance raises concerns about potential biases and distortions that can significantly impact the trustworthiness of the final translated product. Understanding and quantifying this trust is crucial for assessing the fidelity and integrity of the translated texts. The study focuses on the works of renowned Hong Kong writer Louis Cha (Jin Yong). It particularly examines the thematic changes in the concept of “heroism” in his Wuxia (Martial Arts) novel 射雕英雄传 (She Diao Ying Xiong Zhuan) and its translations into English ( A Hero Born: Legends of the Condor Heroes Vol. 1 ) and Portuguese ( Nasce um Herói: Lendas dos Heróis do Condor - Livro 1 ). Using the zero-shot BERTopic model, the research aims to provide a data-driven perspective on the trustworthiness and fidelity of these indirect translations. The primary goal is to investigate how topic modeling can reveal changes in thematic content across languages, assess the influence of intermediary languages on the fidelity of translations, and explore the use of topic modeling as an analytical tool in translation studies.

Unlike traditional topic modeling approaches like Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), which rely on bag-of-words representations, BERTopic utilizes embeddings to capture semantic relationships between words (Devlin et al., 2019; Mikolov et al., 2013). This enables BERTopic to understand the context within the text, discovering topics that are both statistically relevant and semantically rich. The zero-shot variant of BERTopic is particularly valuable for domains where labeled data is insufficient, such as literary fiction. Zero-shot classification refers to a model’s ability to classify data into unseen categories during training by leveraging prior knowledge and understanding of the language or domain, which is especially useful when labeled data for every potential category is impractical or when categories evolve rapidly (Wang et al., 2019). Moreover, zero-shot BERTopic modeling allows a focus on predefined topics, enabling precise identification of documents that closely correlate with these predefined categories.

On the other hand, indirect translation has gained increasing attention in translation studies. Traditionally, studies in this field used qualitative methods and close reading for manual comparison of different language versions to evaluate the faithfulness of indirect translations to the original text (Pięta 2019; Pięta, Ivaska, and Gambier 2022). Researchers like Ivaska (2019) and Ustaszewski (2021) have begun adopting computational approaches, focusing primarily on classifying the source language rather than comparing translation results across languages. This research aims to fill a significant gap in the field by exploring indirect translation results in different languages using BERTopic modeling, especially in a heavily culture-loaded context.

To study trust in indirect literary translation, we narrow down the concept of "trust" to study changes in thematic content across languages. Using zero-shot BERTopic modeling, we focus on thematic changes related to "heroism" in the original Chinese, intermediate English, and target Portuguese versions of the Wuxia novel 射雕英雄传 (She Diao Ying Xiong Zhuan ). The workflow involves three main steps. First, we prepare the data by cleaning the text corpora in Chinese, English, and Portuguese. Second, we use zero-shot BERTopic to perform topic modeling, generating topics related to heroism along with their ten most related words and proposability scores. Finally, we conduct comparisons across languages and topics using cosine similarity to measure thematic alignment.

The findings of the study reveal several key insights. The translation from English to Portuguese shows the highest thematic concordance, with a cosine similarity score of 0.572. This high score is likely due to overlapping Western cultural contexts. In the comparison between English and Chinese, the similarity score is moderate at 0.5059, indicating fair thematic consistency but also highlighting the transformative effects caused by cultural and linguistic barriers. The lowest similarity score, 0.4362, is observed in the Portuguese to Chinese translation. This lower score reveals the challenges of indirect translation, where each translational iteration introduces new interpretative layers, reshaping the original themes.

In conclusion, in direct translation (Chinese to English), the translator acts as a cultural mediator, bridging the gap between source and target cultures. In indirect translation (English to Portuguese), the process becomes more complex as the translator navigates between the source, intermediary, and target cultures. This complex mediation can result in a less faithful representation of the source culture’s heroic ideals. This research applies BERTopic modeling to the study of indirect translations, particularly focusing on a non-European context. The core analysis examines thematic content, especially the concept of heroism, in Louis Cha’s works across different languages. The study aims to determine the most effective strategy for analyzing cross-lingual topic modeling in individual literary works and to understand how indirect translation influence the preservation or transformation of thematic elements during the translation process.

Appendix A

Bibliography
  1. Devlin, Jacob/ Ming-Wei, Chang/ Kenton, Lee/ Kristina, Toutanova (2019): “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” North American Chapter of the Association for Computational Linguistics
  2. Grootendorst, Maarten (2022): "BERTopic: Neural topic modeling with a class-based TF-IDF procedure." arXiv preprint arXiv:2203.05794 (2022).
  3. Ivaska, Laura (2020): A Mixed-Methods Approach to Indirect Translation . Turku: University of Turku.
  4. Mikolov, Tomas / Ilya, Sutskever/ Kai, Chen/ Greg, S. Corrado/ Jeff. Dean (2013): “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26.
  5. Pięta, Hanna (2019): “Indirect Translation: Main Trends in Practice and Research.” Slovo.ru: Baltic Accent 10 (1): 21-36.
  6. Pięta, Hanna/ Laura, Ivaska/ Yves, Gambier (2022): “What can research on indirect translation do for Translation Studies?” in Target 34 (3): 349-369.
  7. Ustaszewski, Michael (2021): “Towards a machine learning approach to the analysis of indirect translation.” In Translation Studies 14 (3): 313-331.
  8. Wang, Wei/ Vincent, W. Zheng/ Han, Yu/ Chunyan, Miao (2019): “A Survey of Zero-shot Learning: Settings, Methods, and Applications.” In ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2): 1-37.
Mengyuan Zhou (lidiazhou@cuhk.edu.hk), The Chinese University of Hong Kong, Hong Kong S.A.R. (China)