INDXR: a system for structured annotation and indexation of images

INDXR 1 is a system developed at the Department of Historical Atlas of the Institute of History of the Polish Academy of Sciences, most recently within the Dariah-PL project. Conceived back in 2016 for indexing scans of historical manuscripts in a geospatial context (Borek et al. 2020), it can be used to work with any images or WMS maps. INDXR allows for structured annotations, but it is not limited to a particular data schema for annotations. The intended usage of INDXR is not to create annotations per se, but to create a whole database in the process of indexation. The system provides a complete annotation and indexation workflow - it comprises a configurator, allowing the user to customize the process of annotating and indexing, an editor for the annotation proper, and a viewer, allowing for online publication of the indexed sources, including searching and browsing through the annotations 2 .

1. Idea and interoperability

INDXR stems from an observation that images, being two-dimensional objects, can be fruitfully treated as spatial data. This led to the idea of applying GIS standards and tools – designed specifically for geospatial data, but also capable of working with spatial data to the task of creating an image annotation and indexation system. INDXR relies on spatial databases, as well as on two Open Geospatial Consortium standards, WMS (Web Map Service, cf. OGC 2006) and WFS (Web Feature Service, cf. OGC 2010), for handling, respectively, images and annotations.

We are aware of the current issues in the field of annotation, notably lack of cross-platform compatibility, but we do not strive for such a compatibility in general. The reliance on GIS standards allows for cross-compatibility within the GIS landscape: one can use any GIS-compliant tool, like QGIS, to work with INDXR annotations. Conversely, INDXR can be used as an annotation tool for any external WMS layer. Still, we acknowledge the importance of the IIIF standard and therefore we are working to make INDXR resources – images and annotations – available via IIIF.

2. Functionality an outline

In INDXR, images are placed in albums, gathering multiple images for joint annotation and indexation. Within a particular album, images are located on an image layer, which the user can pan and zoom in order to work conveniently with several images at once. One image layer is usually enough for an album; still, multiple image layers can be created to accommodate parallel versions of images (e.g. photographs made in natural light and infrared images).

While there are applications for annotating images with text or even for semantic annotation of images (e.g. Recogito, Tropy or Digital Mappa), INDXR was designed specifically to capture complex data associated with images and to enable quick and collaborative annotation. The user can define data schemas, thesauri, as well as connectors to external data sources and authority databases (e.g. geographical databases or Wikidata). As in the case of image layers, annotations within an album are grouped in annotation layers. All the annotations within an annotation layer share a common data structure. It is possible to create multiple annotation layers for a single album, each with its own data structure, thus maintaining separate annotation sets for annotating various aspects of the image content.

3. Current situation, future plans

INDXR has been used in several projects, with a total of almost 200,000 scans annotated with more than 500,000 annotations. For an overview of an early project, see Borek / Panecki 2016; for a glimpse into research currently being conducted with INDXR, see "The last map of Poland" project 3 . The core of INDXR (the editor and the viewer) is stable and mature. We perceive this stability as an important asset and therefore we carefully consider new functionalities to be added. Our current focus is on a more flexible configuration of various data sources, while our future plans include plugging OCR and HTR into INDXR to provide automated transcription of image content.

Illustration: INDXR editor window, annotation of a historical manuscript. A custom data schema and a custom thesaurus are used in the annotations (left subwindow); an additional connection to WMS sources allows to verify on a map all the places occuring in the annotations (right subwindow).

Appendix A

Bibliography
  1. Borek, A. / Panecki, T. (2016): “Cartographic Visualization of Historical Source Data on AtlasFontium.pl.”, in: Gartner, G. / Jobst, M. / Huang, H. (eds): Progress in Cartography: Eurocarto 2015. Cham: Springer 65-81. DOI: 10.1007/978-3-319-19602-2_5.
  2. Borek, A. / Związek , T. / Słomski, M. / Gochna, M. / Myrda, G. / Słoń, M . (2020): “Technical and methodological foundations of digital indexing of medieval and early modern court books”, in: Digital Scholarship in the Humanities 35, 2: 233–253. DOI: 10.1093/llc/fqz030.
  3. Open Geospatial Consortium (OGC) ( 2006 ) : Web Map Server Implementation Specification, https://www.opengeospatial.org/standards/wms [ 15.08.2024 ].
  4. Open Geospatial Consortium (OGC) ( 2010 ) : Web Feature Service 2.0 Interface Standard, https://www.opengeospatial.org/standards/wfs [15.08.2024].
Notes
1.
INDXR website: https://indxr.ihpan.edu.pl/en/ [ 15.08.2024 ].
2.
Example projects: https://indxr.ihpan.edu.pl/en/examples/ [ 15.08.2024 ].
3.
Project “Cartography at the service of political reforms in the times of Stanisław August Poniatowski – a critical elaboration of ‘Geographical-statistical description of the parishes in the Kingdom of Poland’ and the maps of the palatinates by Karol Perthées”, https://perthees.ihpan.edu.pl/?page_id=617&lang=en [ 15.08.2024 ].
Grzegorz Myrda (gmyrda@ihpan.edu.pl), Institute of History, Polish Academy of Sciences and Maria Fronczak (mfronczak@ihpan.edu.pl), Institute of History, Polish Academy of Sciences