TY - GEN
T1 - Finite-state technology as a programming environment
AU - Wintner, Shuly
PY - 2007
Y1 - 2007
N2 - Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper focuses on several aspects of large-scale grammar development. Using a real-world benchmark, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.
AB - Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper focuses on several aspects of large-scale grammar development. Using a real-world benchmark, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.
UR - http://www.scopus.com/inward/record.url?scp=37149004898&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-70939-8_9
DO - 10.1007/978-3-540-70939-8_9
M3 - Conference contribution
AN - SCOPUS:37149004898
SN - 354070938X
SN - 9783540709381
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 97
EP - 106
BT - Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
PB - Springer Verlag
T2 - 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
Y2 - 18 February 2007 through 24 February 2007
ER -