Speeding up HMM decoding and training by exploiting sequence repetitions

Shay Mozes, Oren Weimann, Michal Ziv-Ukelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi's decoding and training algorithms [21], as well as to the forward-backward and Baum-Welch [4] algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. We describe three algorithms based alternatively on byte pair encoding (BPE) [19], run length encoding (RLE) and Lempel-Ziv (LZ78) parsing [22]. Compared to Viterbi's algorithm, we achieve a speedup of Ω(r) using BPE, a speedup of Ω(r/log r) using RLE, and a speedup of Ω(log n/k) using LZ78, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. Furthermore, unlike Viterbi's algorithm, our algorithms are highly parallelizable.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 18th Annual Symposium, CPM 2007, Proceedings
PublisherSpringer Verlag
Pages4-15
Number of pages12
ISBN (Print)9783540734369
DOIs
StatePublished - 2007
Externally publishedYes
Event18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007 - London, ON, Canada
Duration: 9 Jul 200711 Jul 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4580 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007
Country/TerritoryCanada
CityLondon, ON
Period9/07/0711/07/07

Keywords

  • Compression
  • Dynamic programming
  • HMM
  • Viterbi

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Speeding up HMM decoding and training by exploiting sequence repetitions'. Together they form a unique fingerprint.

Cite this