The need to approximately match run-length encoded strings emerged during development of an optical character recognition (OCR) system. This system, built in association with Data Capture Systems Inc., is designed to achieve a low substitution error-rate via fixed-front character recognition. The ith row or column of pixels in a given query character image will define a binary string containing a small number of white-black transitions. By comparing this run-length encoded string against the ith row or column of each of the character image-models, the similar can be identified. An algorithm which finds these longest common subsequences of strings X and Y in time polynomial in the size of the compressed strings is presented.
|Number of pages||9|
|State||Published - 1997|
|Event||Proceedings of the 1997 International Conference on Compression and Complexity of Sequences - Positano, Italy|
Duration: 11 Jun 1997 → 13 Jun 1997
|Conference||Proceedings of the 1997 International Conference on Compression and Complexity of Sequences|
|Period||11/06/97 → 13/06/97|
Bibliographical noteFunding Information:
* Work supported in part by NSF Grants CCR-9201078 and CCR-9700276, by NATO Grant CRG 900293, by British Engineering and Physical Sciences Research Council Grant GR L19362, by the National Research Council of Italy, and by the ESPRIT III Basic Research Programme of the EC under Contract 9072 (Project GEPPCOM). -Work supported in part by NSF Grants CCR-9305873 and CCR-9610238. This work is partially supported by ONR Award 431-0857A and NSF Grant CCR-9625669.
ASJC Scopus subject areas
- Engineering (all)