Pure Conversation with AI: Building Generative Agents for Reenacting Debates in History

Currently, digital humanities (DH) studies face criticism from scholars in the humanities due to a perceived disparity between computational methods heavily reliant on word frequency and the nuanced interpretation that humanities scholars aim to address (Da, 2019, p. 606). The integration of generative AI has the potential to revolutionize this landscape, expanding DH by posing novel questions that directly engage with the understanding of the interpretive and perspectival nuances that are the essence of humanities research. Recent scholarly attempts began to explore this issue, but their explorations primarily centered on English language texts (e.g., Underwood, 2023; Bamman, 2023).

This work “reinvents” a DH approach to examine the historiographical biases within debates in pre-modern China. The “reinvention” comprises two key aspects: First, it builds generative agents to reenact historical debates by interacting with them a novel approach that was impossible before the emergence of generative AI technology. Second, by focusing on pre-modern China, it addresses the issue of global inequities in AI technology.

Specifically, we investigate the “pure conversation” ( Qingtan 清談), a central mean of realizing the large scale movement of intellectual liberation during the Wei and Jin dynasties (220 – 420 C.E.) (Tang, 1991). Given its inclination towards metaphysical debates (Mather, 2002, p. xxv), the “pure conversation” serves as an ideal lens for exploring Chinese intellectual history. Utilizing texts from different periods as contextual backdrops for various speakers, we can extract period specific biases embedded in the underlying knowledge of a large language model. On a stage set by a “multi-agent system” (Park et al., 2023; Wu et al., 2023), concepts from different periods can engage in debate on shared topics, reenacting a “pure conversation” session that transcends time and space.

As illustrated in Figure 1, we built an innovative multiple multiple-agent system for reenacting the “pure conversation.” Initially, we conducted text embedding by transforming each of the 18 dynastic histories, representative of different periods in Chinese history, into a vector space using the “text-embedding-ada-002” model from OpenAI. These embeddings were then stored in a vector database for retrieval. Covering a compilation date range from the 1st century B.C.E. to the 20th century C.E., the 18 dynastic histories include the “Twenty-Four Histories” ( Ershisi shi 二十四史), with the addition of Draft History of Qing ( Qing shi gao 清史稿) and the exclusion of seven histories that overlapped with others. Before being input into the model, the text was chunked into segments of less than 500 characters.

Subsequently, agent “historians” were created to retrieve relative context from the vector database. The top 5 segments with the highest cosine similarity to the vector of the topics, which serves as the “query,” were retrieved as the context for the speakers’ reference.

Finally, agent “speakers” were created using the “gpt-4-1106-preview” model from OpenAI, to generate speeches on a given topic. The retrieved context, once obtained, was merged with a predefined prompt to provide a guideline for the speakers, ensuring that the resulting speeches align with the designated context. The topics selected (see Table 1) are two of the most important debate topics from A New Account of the Tales of the World ( Shishuo xinyu 世說新語) (Liu, 2002), which is the primary historical source concerning “pure conversation.”

The debate on whether sages possess emotions originated in the Daoist school of the pre-Qin period (c. 5th century B.C.E.) Influenced by the Daoist thoughts, this debate was revisited by the philosophers Wang Bi and He Yan during the Wei-Jin Metaphysical movement. The discussion ultimately concluded with the victory of Wang Bi, who advocated the “possession of emotions” theory, which thereafter became the mainstream view. In a round of “pure conversation,” our generative agents, with a model temperature set to 0 for result stability, unanimously affirmed that sages do possess emotions. This confirms Wang Bi’s victory and demonstrates the consistency of the concepts across texts from different periods.

For the ruler versus father debate, in the initial round of the “pure conversation,” the agents’ responses exhibited significant differences. To further explore, we conducted a total of ten rounds of “pure conversation” and adjusted the model’s temperature to 0.2 for nuanced variations in the generated results. We then calculated the proportion of choosing the father based on the agents’ answers in different historical contexts, presented chronologically (see Figure 2).

Figure 2. Proportion of choosing the father among agents with different dynasties’ contexts

The debate over the significance of the ruler versus the father has been a recurring topic of discussion in pre-modern Chinese history. According to our findings, aside from potential anomalies arising from biases in historical record compilation, it can be generally asserted that before the Wei-Jin period (3rd century – 5th century C.E.), the prevailing notion in official histories favored the ruler’s importance over the father’s. However, this trend underwent a reversal after the Wei-Jin period.

Through the analysis of contexts from 18 dynastic histories and the generation of responses to pivotal questions, we unveiled significant historical trends in intellectual history. This groundbreaking approach showcases the potential of generative AI in reinterpreting historical debates and perspectives. Furthermore, our study underscores the crucial role of deploying cutting-edge AI technology in digital humanities research for non-Western contexts, taking a step towards challenging the “digital hegemony” (Martin & Runyon, 2016) in DH.

Appendix A

Bibliography

Bamman, D. (2023). The Promise and Peril of Large Language Models for Cultural Analytics. Workshop on AI and Large Language Models (LLMs) for the Analysis of Large Literary Corpora , Paris, France, December 2023.
Da, N. Z. (2019). The Computational Case Against Computational Literary Studies. Critical Inquiry, 45 (3), 601–639. https://doi.org/10.1086/702594
Liu, I. (2002). Shih-shuo Hsin-yü: A New Account of Tales of the World (R. B. Mather, Trans.; Second Edition). Center for Chinese Studies, The University of Michigan.
Martin, J. D., & Runyon, C. (2016). Digital Humanities, Digital Hegemony: Exploring Funding Practices and Unequal Access in the Digital Humanities. ACM SIGCAS Computers and Society, 46 (1), 20–26. https://doi.org/10.1145/2908216.2908219
Mather, R. B. (2002). Introduction: The World of the Shih-shuo Hsin-yü. In Shih-shuo Hsin-yü: A New Account of Tales of the World (Second Edition, pp. xiii–xxxv). Center for Chinese Studies, The University of Michigan.
Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology , 1–22. https://doi.org/10.1145/3586183.3606763
Tang, Y. (1991). The Voices of Wei-Jin Scholars: A Study of “Qingtan” [Doctoral dissertation]. Columbia University.
Underwood, T. (2023). Prediction and Surprise. Workshop on AI and Large Language Models (LLMs) for the Analysis of Large Literary Corpora , Paris, France, December 2023.
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., & Wang, C. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation . https://doi.org/10.48550/arXiv.2308.08155

Topic	Topic Translation	Page in Liu (2002)
聖人有情不？	Does the sage have emotions, or not?	129
今有一丸藥，得濟一人疾，而君、父俱病，與君邪？與父邪？	Suppose now you have one medicinal pill which can cure one man’s illness, and your ruler and your father are both sick. Should you give it to your ruler or to your father?	470