Matching for run-length encoded strings

Alberto Apostolico, Gad M. Landau, Steven Skiena

Research output: Contribution to conferencePaperpeer-review


The need to approximately match run-length encoded strings emerged during development of an optical character recognition (OCR) system. This system, built in association with Data Capture Systems Inc., is designed to achieve a low substitution error-rate via fixed-front character recognition. The ith row or column of pixels in a given query character image will define a binary string containing a small number of white-black transitions. By comparing this run-length encoded string against the ith row or column of each of the character image-models, the similar can be identified. An algorithm which finds these longest common subsequences of strings X and Y in time polynomial in the size of the compressed strings is presented.

Original languageEnglish
Number of pages9
StatePublished - 1997
Externally publishedYes
EventProceedings of the 1997 International Conference on Compression and Complexity of Sequences - Positano, Italy
Duration: 11 Jun 199713 Jun 1997


ConferenceProceedings of the 1997 International Conference on Compression and Complexity of Sequences
CityPositano, Italy

Bibliographical note

Funding Information:
* Work supported in part by NSF Grants CCR-9201078 and CCR-9700276, by NATO Grant CRG 900293, by British Engineering and Physical Sciences Research Council Grant GR L19362, by the National Research Council of Italy, and by the ESPRIT III Basic Research Programme of the EC under Contract 9072 (Project GEPPCOM). -Work supported in part by NSF Grants CCR-9305873 and CCR-9610238. This work is partially supported by ONR Award 431-0857A and NSF Grant CCR-9625669.

ASJC Scopus subject areas

  • General Engineering


Dive into the research topics of 'Matching for run-length encoded strings'. Together they form a unique fingerprint.

Cite this