Data Linking Infrastructure – Foundations and Architecture

Investigator: Prof. Dr. Ralf Möller

Research Associate: Dr. Sylvia Melzer

A data linking infrastructure is envisioned to support humanities scholars from all research fields of the Cluster of Excellence “Understanding Written Artefacts” such that various kinds of data can be easily and systematically combined in order to foster scientific progress. On the one hand, there are images and videos of written artefacts, in some cases associated with text data making parts of image (or video) content explicit, e.g., using optical character recognition techniques. On the other hand, different kinds of chemistry and materials science data are collected to further describe written artefacts under investigation, almost always in combination with descriptive temporal and spatial data. Data of this kind must be made available to humanities scientists such that they are best supported in their scientific work. Publications from humanities projects will refer to artefact data of the kind described above, and, after a while, artefact data are referenced in quite some number of natural language publications resulting from scientific work in humanities projects, e.g., journal articles, conference papers, and PhD theses. Publications are provided as documents, which are represented, e.g., as PDF data. Further natural language data comes from existing humanities research databases. All data can be described in an appropriate way using suitable metadata formalisms (date of creation, author, etc.). In addition, and different from metadata, all kinds of base data (also called raw data) might be extended with derived data, with which certain features are made explicit (e.g., for supporting visualization, for information retrieval, or for other research efforts).