216 S.-Z. Yu / Artificial Intelligence 174 (2010) 215–243
The first approach to hidden semi-Markov model was proposed by Ferguson [60], which is partially included in the
survey paper by Rabiner [150]. This approach is called the explicit duration HMM in contrast to the implicit duration of
the HMM. It assumes that the state duration is generally distributed depending on the current state of the underlying
semi-Markov process. It also assumes the “conditional independence” of outputs. Levinson [107] replaced the probability
mass functions of duration with continuous probability density functions to form a continuously variable duration HMM. As
Ferguson [60] pointed out, an HSMM can be realized in the HMM framework in which both the state and its sojourn time
since entering the state are taken as a complex HMM state. This idea was exploited in 1991 by a 2-vector HMM [93] and a
duration-dependent state transition model [179]. Since then, similar approaches were proposed in many applications. They
are called in different names such as inhomogeneous HMM [151], non-stationary HMM [164], and recently triplet Markov
chains [144]. These approaches, however, have the common problem of computational complexity in some applications.
A more efficient algorithm was proposed in 2003 by Yu and Kobayashi [199], in which the forward–backward variables are
defined using the notion of a state together with its remaining sojourn (or residual life) time. This makes the algorithm
practical in many applications.
The HSMM has been successfully applied in many areas. The most successful application is in speech recognition. The
first application of HSMM in this area was made by Ferguson [60]. Since then, there have been more than one hundred such
papers published in the literature. It is the application of HSMM in speech recognition that enriches the theory of HSMM
and develops many algorithms for HSMM.
Since the beginning of 1990’s, the HSMM started being applied in many other areas such as electrocardiograph (ECG)
[174], printed text recognition [4] or handwritten word recognition [95], recognition of human genes in DNA [94], language
identification [118], ground target tracking [88], document image comparison and classification at the spatial layout level
[81], etc.
In recent years from 2000 to present, the HSMM has been obtained more and more attentions from vast application
areas such as change-point/end-point detection for semi-conductor manufacturing [64], protein structure prediction [162],
mobility tracking in cellular networks [197], analysis of branching and flowering patterns in plants [69], rain events time se-
ries model [159], brain functional MRI sequence analysis [58], satellite propagation channel modelling [112], Internet traffic
modelling [198], event recognition in videos [79], speech synthesis [204,125], image segmentation [98], semantic learning
for a mobile robot [167], anomaly detection for network security [201], symbolic plan recognition [54], terrain modelling
[185], adaptive cumulative sum test for change detection in non-invasive mean blood pressure trend [193], equipment
prognosis [14], financial time series modelling [22], remote sensing [147], classification of music [113], and prediction of
particulate matter in the air [52], etc.
The rest of the paper is organized as follows: Section 2 is the major part of this paper that defines a unified HSMM and
addresses important issues related to inference, estimation and implementation. Section 3 then presents three conventional
HSMMs that are applied vastly in practice. Section 4 discusses the specific modelling issues, regarding duration distributions,
observation distributions, variants of HSMMs, and the relationship to the conventional HMM. Finally, Section 5 highlights
major applications of HSMMs and concludes the paper in Section 6.
2. Hidden semi-Markov model
This section provides a unified description of HSMMs. A general HSMM is defined without specific assumptions on the
state transitions, duration distributions and observation distributions. Then the important issues related to inference, esti-
mation and implementation of the HSMM are discussed. A general expression of the explicit-duration HMMs and segment
HMMs can be found in Murphy [126], and a unified view of the segment HMMs can be found in Ostendorf et al. [136].
Detailed review for the conventional HMM can be found in the tutorial by Rabiner [150], the overview by Ephraim and
Merhav [57], the Bayesian networks-based discussion by Ghahramani [66], and the book by Cappe et al. [29].
2.1. General model
A hidden semi-Markov model (HSMM) is an extension of HMM by allowing the underlying process to be a semi-Markov
chain with a variable duration or sojourn time for each state. Therefore, in addition to the notation defined for the HMM,
the duration d of a given state is explicitly defined for the HSMM. State duration is a random variable and assumes an
integer value in the set
D ={1, 2,...,D}. The important difference between HMM and HSMM is that one observation per
state is assumed in HMM while in HSMM each state can emit a sequence of observations. The number of observations
produced while in state i is determined by the length of time spent in state i, i.e., the duration d. Now we provide a unified
description of HSMMs.
Assume a discrete-time Markov chain with the set of (hidden) states
S ={1,...,M}. The state sequence is denoted by
S
1:T
S
1
,...,S
T
, where S
t
∈ S is the state at time t. A realization of S
1:T
is denoted as s
1:T
. For simplicity of notation in
the following sections, we denote:
• S
t
1
:t
2
= i –statei that the system stays in during the period from t
1
to t
2
. In other words, it means S
t
1
= i, S
t
1
+1
= i,...,
and S
t
2
= i. Note that the previous state S
t
1
−1
and the next state S
t
2
+1
may or may not be i.