Skip to Main content Skip to Navigation
Journal articles

Transcription automatique et segmentation thématique de livres d’heures manuscrits

Abstract : Books of Hours are the number one best seller of the Middle Ages, with more than 10 000 copies preserved. They are a crucial witness to the medieval mindset, but their textual contents have been very scarcely studied. They are very long and offer a complex hierarchical entangled structure, with several characteristics specific to medieval daily Prières office. This paper presents the methods and processing applied to books of hours: handwritten text recognition and text segmentation adapted to medieval manuscripts. We propose a weak supervised approach, based on the overarching structure of the manuscripts, that provides the first state-of-the-art results on transcript texts and despite remaining errors for this new challenging task.
Complete list of metadata

http://hal.univ-nantes.fr/hal-02430291
Contributor : Béatrice Daille Connect in order to contact the contributor
Submitted on : Tuesday, January 7, 2020 - 11:22:23 AM
Last modification on : Wednesday, October 13, 2021 - 3:52:06 PM

Identifiers

  • HAL Id : hal-02430291, version 1

Citation

Béatrice Daille, Amir Hazem, Christopher Kermorvant, Martin Maarand, Marie-Laurence Bonhomme, et al.. Transcription automatique et segmentation thématique de livres d’heures manuscrits. Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2019, TAL et humanités numériques, 60 (3), pp.13-36. ⟨hal-02430291⟩

Share

Metrics

Record views

194