Transcribing Medieval Manuscripts for Machine Learning

07/15/2022
by   Estelle Guéville, et al.
0

In the early twentieth century, many scholars focused on the preparation of editions and translations of texts previously available only to the few specialists able to read archaic hands and privileged enough to travel to work in person with them in manuscript. Valuable scholarship in its own right, the preparation of these editions and translations for particular texts deemed important enough to justify the effort and time, laid the foundation for generations of scholarship in medieval studies. On the other hand, for many materials in historical archival collections, including already digitised collections, medievalists have only had the time to create partial transcriptions, if any at all. Access to textual material from the medieval period has increased greatly in recent years with digitisation, and we are able to imagine many new research projects in decades to come. What challenges do new frontiers of automation in the archives raise with respect to medieval studies and in particular to the ways we transcribe? In this article, we argue that if medievalists hope to pursue the kinds of analysis that goes on in advanced computational research, we will need new kinds of transcriptions, intentionally theorized not only for human reading, but also for machine processing. We already have mature methods for remediating generations of editions of medieval works such as Optical Character Recognition (OCR), but we can ask ourselves if these are the kinds of text we want to use for future computational analysis. We suggest instead that one way forward is by going back to the scriptorium.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset