Crafting Twist Ending: A Human-AI Collaborative Writing Tool for Short Stories

1. INTRODUCTION

This paper presents our preliminary effort to develop a language model-based human-AI collaborative authoring tool for short stories, with a primary focus on writing flash fiction featuring surprise or twist endings. We posit that flash fiction with a twist ending is an ideal candidate for large language model-based story generation for two reasons. First, the average length of flash fiction is around 1,000 words, which is relatively easy to analyze or generate using state-of-the-art language models. Second, a plot twist or surprise ending can be crucial to a story's appeal from both emotional and cognitive perspectives (Kintsch 1980, Bae et al. 2021).

2. APPROACH

Story Dataset Collection: We collected two different datasets of short stories. One was gathered through crowdsourcing using commercial storytelling cards (see Figure 1) (Jang et al. 2023). The other was selected from a collection of flash fiction written by Kim Dong-sik, a Korean author renowned for his short stories with twist endings. Since 2018, Kim has published more than 300 pieces of flash fiction. Similar to Propp's 31 narrative functions in Russian folktales (Propp 1968), we defined 17 story units to analyze the structure of short stories with twist endings (Bae et al. 2023). These two story datasets were collected independently for different purposes: one for the emotional analysis of story patterns in English, and the other for the structural analysis of twisted-ending stories in Korean.

Figure 1. Examples of Collected Text Stories Using Illustration Cards (Jang et al. 2023)

Dataset Analysis: First, we explore the sentiment patterns in a crowdsourced short story dataset, categorized by their ending type—either positive or negative (Jane et al. 2023). Furthermore, we annotate each story's level of interestingness and the contributing factors, such as detailed realism, humor, suspense, and plot twists, building upon our prior research on what makes stories interesting (Bae et al. 2021). Next, while examining Kim Dong-sik's flash fiction dataset, we introduce a new analysis method by annotating each story’s twist type, distinguishing whether it pertains to the plot or the character. Additionally, we assess the overall interestingness of each story.

Collaborative Writing Tool Design and Considerations: The widespread adoption of large language models (LLMs) has significantly influenced the development of human-AI "collaborative" story-writing systems, such as CoAuthor (Lee et al. 2022) and Wordcraft (Yuan et al. 2022), to support creative writing. While these LLM-based tools perform surprisingly well, they have evident limitations—for example, a lack of coherence in longer stories and the tendency to produce bland or clichéd events. Additionally, these tools bring up crucial issues, such as the need for new evaluation metrics and ways to measure the user's contribution.

Our paper introduces "interestingness" and its contributing factors as a new story evaluation metric. In our prototype writing tool, we consider narrative elements involving character arcs—whether positive or negative (Weiland 2016), myth-based character archetypes (Schmidt 2011), a story’s emotional arcs (Reagan 2016), master plot types (Tobias 2012), and Freytag’s 5-stage dramatic structure—exposition, rising action, climax, falling action, and denouement (Freitag 1895). Currently, we are working on combining our 17 story units for flash fiction with Freytag's 5-stage plot structure design.

Optimizing Language Models: Prompt engineering, such as CoT (Chain-of-Thought; Wei et al. 2022), or appropriate fine-tuning of large language models, can enhance the output. We are currently fine-tuning GPT-3.5-turbo with our annotated dataset and experimenting with various prompt formulas for improved results.

3. CONCLUSION AND FUTURE WORK

This paper presents our work on annotating short story datasets and our ongoing efforts to develop a language model-based writing tool for short stories. Our current focus is on fine-tuning the language model to generate more engaging short stories, incorporating narrative features such as plot twists supported by foreshadowing.

ACKNOWLEDGEMENTS

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2022S1A5A2A03052246).

Appendix A

Bibliography

Kintsch, Walter (1980): “Learning from Text, Levels of Comprehension, or: Why Anyone  Would Read a Story Anyway”, in Poetics 9(1–3), 87–98.
Bae, Byung-Chull / Jang, Suji / Kim, Youngjune / Park, Seyoung (2021): “A Preliminary Survey on Story Interestingness: Focusing on Cognitive and Emotional Interest”, in: Mitchell, A., Vosmeer, M. (eds) Interactive Storytelling. ICIDS 2021. Lecture Notes in Computer Science, vol 13138. Springer, Cham. https://doi.org/10.1007/978-3-030-92300-6_45.
Jang, Suji / Seo, Chaewon / Bae, Byung-Chull (2023): “Sentiment Analysis of a Text Story Dataset Collected Using Illustration Cards”, in: Holloway-Attaway, L., Murray, J.T. (eds) Interactive Storytelling. ICIDS 2023. Lecture Notes in Computer Science, vol 14384. Springer, Cham. https://doi.org/10.1007/978-3-031-47658-7_19.
Propp, Vladimir (1968): Morphology of the Folktale. University of Texas Press, 2nd Edition.
Bae, Byung-Chull / Kim, Yeji / Yu, Mingyeong / Park, Seyoung / Kim, Youngjune / Cheong, Yun-Gyung (2023): “Toward an AI-Collaborated Authoring Tool for Writing Flash Fiction”, in: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 Posters (HCII 2023). Communications in Computer and Information Science, vol 1836. Springer, Cham. https://doi.org/10.1007/978-3-031-36004-6_51.
Lee, Mina / Liang, Percy / Yang, Qian (2022): “CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities”, in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, Article No.: 388, 1 – 19. https://doi.org/10.1145/3491102.3502030.
Yuan, Ann / Coenen, Andy / Reif, Emily / Ippolito, Daphne (2022): “Wordcraft: Story Writing With Large Language Models”, in: Proceedings of the 27th International Conference on Intelligent User Interfaces, 841 – 852. https://doi.org/10.1145/3490099.3511105
Weiland, K. M. (2016): Creating character arcs: The masterful author’s guide to uniting story structure, plot, and character development. PenForASword.
Schmidt, Victoria Lynn (2011): 45 Master characters, Revised Edition: Mythic Models for Creating Original Characters. Penguin.
Reagan, Andrew J. / Mitchell, Lewis / Kiley, Dilan / Danforth, Christopher M. / Dodds, Peter Sheridan (2016): “The emotional arcs of stories are dominated by six basic shapes”, in: EPJ Data Science. 5, 31. https://doi.org/10.1140/epjds/s13688-016-0093-1.
Tobias, Ronald B. (2012): 20 Master plots: And how to build them. Writer’s Digest Books.
Freytag, Gustav. (1895): Technique of the drama: An exposition of dramatic composition and art.
Wei, Jason / Wang, Xuezhi / Schuurmans, Dale / Bosma, Maarten / Ichter, Brian / Xia, Fei / Chi, Ed / Le, Quoc / Zhou, Denny. (2022): “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, in: Proceedings of the 36th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 1800, 24824–24837.