Automatic retro-structuration of auction sales catalogs layout and content
Scheithauer, Hugo (1,3); Bénière, Sarah (1); Romary, Laurent (2)
1: ALMAnaCH, Inria, France; 2: Inria, Directorate for Scientific Information and Culture; 3: École pratique des hautes études (EPHE)
This paper showcases a pipeline for automatically retro-structuring auction sales catalogs, based on document layout analysis and information extraction technologies. Structured layout and textual data are then transformed into TEI XML for publication. It also advocates for a generalized use of layout segmentation in digitization pipelines.
Pure Conversation with AI: Building Generative Agents for Reenacting Debates in History
Chen, Yuqi (1); Shang, Wenyi (2,3); Chen, Song (4)
1: Peking University, China; 2: University of Illinois Urbana-Champaign, United States of America; 3: University of Missouri, United States of America; 4: Bucknell University, United States of America
HTML XMLThis work “reinvents” a DH approach to examine the historiographical biases in premodern China. By building multiple generative agents, we reenacted historical debates and interacted with them, which is a novel approach impossible before the emergence of generative AI technology. Our results unveiled significant historical trends in intellectual history.
Exploring Intellectual Design and Digital Storytelling in Digital Humanities: Towards A Curation Model
ZHAO, Ke (1,2,3); WANG, Xiaoguang (1,2); HOU, Xilong (4); GONG, Yue (1,2)
1: School of Information Management, Wuhan University, China; 2: Intellectual Computing Laboratory for Cultural Heritage, Wuhan University, China; 3: School of Advanced Study, University of London, UK; 4: School of Communication, Qufu Normal University, China
HTML XMLThis paper focuses on how the interplay of intellectual design and digital storytelling with scholarly primitives can enhance knowledge production, representation, and dissemination, thus collectively contributing to the development of digital humanities curation. It redefines design as intellectual thinking and provides new perspectives on digital storytelling in scholarly research.
PicAxe: Creating an Open-source Image Extraction Tool for Large and Diverse Corpora of Text-Image PDF Documents
Guerrero, Anna Clemencia (1); Dinner, Aaron (2); Kamath, Krishna (2); Damerow, Julia (3)
1: Santa Fe Institute, United States of America; 2: University of Chicago, United States of America; 3: Arizona State University, United States of America
HTML XMLPicAxe is an open source Python tool for humanities researchers to automatically extract images (diagrams, illustrations, photographs, graphs, tables) from heterogenous corpora of digital text-image environments (journal articles, book chapters, newspapers, and letters created during different time periods). We discuss current problems and seek input on improving functionality.
Listening to (Digital) Images: A Black Sound Studies Approach to Alt-Text
Adams, Margy (1); Sharma, Tanvi (2); Li, Shiyao (1); Varner, Jay (2); Klein, Lauren (1)
1: Emory University, United States of America; 2: Emory Center for Digital Scholarship
HTML XMLThis paper uses alt-text as a way to explore accessibility in Digital Humanities. We discuss our approach to writing alt-text and introduce a conceptual overlay of “listening to images” (Campt 2017). We posit that this results in language that is aligned with our argument and attentive to the images themselves.