Tikkoun Sofrim: Making Ancient Manuscripts Digitally Accessible: The Case of Midrash Tanhuma

Alan J. Wecker, Vered Raziel-Kretzmer, Benjamin Kiessling, Daniel Stökl Ben Ezra, Moshe Lavee, Tsvi Kuflik, Dror Elovits, Moshe Schorr, Uri Schor, Pawel Jablonski

Research output: Contribution to journalArticlepeer-review

Abstract

Making ancient handwritten manuscripts accessible to the general public is challenging, for several reasons. Foremost, they are handwritten. Each and every one is unique, so there is a need for manual transcription for providing enough examples for training a machine-learning-based algorithm to automatically transcribe the handwritten text. Moreover, the quality of the text is diverse-over time the ink faded, pages were damaged, and so forth. Furthermore, the boundaries of the textual regions on a page and the lines of text are not standard. Sometimes there are corrections above the lines, the lines are curved, there are comments and annotations on the margins, and more. A possible solution for these challenges is having a "person in the loop."However, manual correction brings with it another challenge-how to address disagreement between annotations (as usually several corrections are considered before a decision is taken about the correct transcription). Tikkoun-Sofrim is a system that integrates automatic handwritten text recognition with manual, crowdsourced error correction, introducing an automatic decision process about when to stop asking for additional transcription and selecting the best transcription, declaring it as the recommended agreed reading. The system was applied to several manuscripts of "Midrash Tanhuma,"a medieval Hebrew rabbinic homiletic text, achieving a high level of success.

Original languageEnglish
Article number20
JournalJournal on Computing and Cultural Heritage
Volume15
Issue number2
DOIs
StatePublished - Jun 2022

Bibliographical note

Publisher Copyright:
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Keywords

  • CATTI
  • HTR
  • crowd-sourcing
  • handwritten text recognition
  • transcription

ASJC Scopus subject areas

  • Conservation
  • Information Systems
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Tikkoun Sofrim: Making Ancient Manuscripts Digitally Accessible: The Case of Midrash Tanhuma'. Together they form a unique fingerprint.

Cite this