Abstract
Making ancient handwritten manuscripts accessible to the general public is challenging, for several reasons. Foremost, they are handwritten. Each and every one is unique, so there is a need for manual transcription for providing enough examples for training a machine-learning-based algorithm to automatically transcribe the handwritten text. Moreover, the quality of the text is diverse-over time the ink faded, pages were damaged, and so forth. Furthermore, the boundaries of the textual regions on a page and the lines of text are not standard. Sometimes there are corrections above the lines, the lines are curved, there are comments and annotations on the margins, and more. A possible solution for these challenges is having a "person in the loop."However, manual correction brings with it another challenge-how to address disagreement between annotations (as usually several corrections are considered before a decision is taken about the correct transcription). Tikkoun-Sofrim is a system that integrates automatic handwritten text recognition with manual, crowdsourced error correction, introducing an automatic decision process about when to stop asking for additional transcription and selecting the best transcription, declaring it as the recommended agreed reading. The system was applied to several manuscripts of "Midrash Tanhuma,"a medieval Hebrew rabbinic homiletic text, achieving a high level of success.
Original language | English |
---|---|
Article number | 20 |
Journal | Journal on Computing and Cultural Heritage |
Volume | 15 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2022 |
Bibliographical note
Publisher Copyright:© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Keywords
- CATTI
- HTR
- crowd-sourcing
- handwritten text recognition
- transcription
ASJC Scopus subject areas
- Conservation
- Information Systems
- Computer Science Applications
- Computer Graphics and Computer-Aided Design