
Chib and Winkelmann: Markov Chain Monte Carlo Analysis of Correlated Count Data 429
law of the iterated expectation, one obtains E
4y
i
—
‚1 D5
D
Q
‹
i
and var
4y
i
—
‚1 D5
D
e
å
i
C
e
å
i
6
exp
4D5
ƒ
11
0
7
e
å
i
1
where we have
‚
D
4‚
1
1 : : : 1 ‚
J
5
. Hence, the covariance between the counts
is represented by the terms
cov
4y
ij
1 y
ik
5
D
Q
‹
ij
4
exp
4d
jk
5
ƒ
1
5
Q
‹
ik
D
‹
ij
exp
4
0
0
5
d
jj
54
exp
4d
jk
5
ƒ
1
5‹
ik
exp
4
0
0
5
d
kk
51
j
6D
k1
which can b e positive or negative depending on the sign of
d
jk
, the
4j1 k5
element of
D
. Moreover, the model allows for
overdispersion, a variance in excess of the expectation, as long
as
d
jj
>
0. The correlation structure of the counts is thus unre-
stricted. Note, however, that the marginal distribution of the
counts
y
i
cannot be obtained by direct computation, re quiring
as it does the evaluation of a
J
-variate integral of the Poisson
distribution in (1) with resp ect to the distribution of
b
i
.
It is interesting to note that our model is similar to that
of Gurmu and Elder (1998) except that in their model the
distribution of
b
ij
is left unspeci ed. Under that assumption,
the model becomes computationally intractable for anything
more than a few correlated counts. As we show in this arti-
cle, it is possible to t higher-dimensional models provided
one is willing to make a parametric distributional assumption
for
b
i
, which in turn provides a clean i nterpretation for the
correlation structure. The assumption of normality is not cru-
cial and can be generalized. For example, it is easy to let the
distribution of the latent effects be multivariate-
t
instead of
the multivariate-normal, as will be discussed, or to model the
distribution by a nite mixture of normal distributions. More
importantly, it is possible to relax the assumption, implicit in
the preceding formulation, that the
b
i
are independent of the
covariates by letting the mean of
b
i
be a function of one or
more of the available covariates. The estimation approach that
we will present needs to be modi ed only slightly to incorpo-
rate this feature. Finally, our model can be specialized to the
panel-data setting (where the index
j
represents time) by let-
ting the conditional mean function be
ˆ
ij
D
exp
4x
0
ij
‚
C
w
0
ij
b
i
5
,
where
w
ij
is a set of covariates that are a subset of
x
ij
. This is
exactly the model of CGW that in turn is a generalization of
the model of Hausman et al. (1984). It should be noted that,
in this specialization of the general model, fewer than
J
latent
effects appear in the conditional mean function of subject
i
.
2. ESTIMATION OF THE MODEL
2.1 Likelihood Function
Let us suppose that the observations
y
i
D
4y
i
1
1 : : : 1 y
iJ
5
are
conditionally independent across clusters. Then, the likelihood
function is the product of the contributions
p4y
i
—
‚1 D5
, where
p4y
i
—
‚1 D5
is the joint probability of the
J
counts in cluster
i
given by
p4y
i
—
‚1 D5
D
Z
J
Y
j
D
1
f 4y
ij
—
‚
j
1 b
ij
5”
J
4b
i
—
0
1 D5
d
b
i
1
(3)
where
f
, as previously, is the Poisson mass function condi-
tioned on
4‚
j
1 b
ij
5
and
”
is the
J
-variate normal density fu nc-
tion. This multiple integral cannot be solved in closed form
for arbitrary
D
, but some simpli cations are possible if
D
is
assumed to be a diagonal matrix. To deal with the general case,
however, it is necessary to turn to simulation-based methods.
2.2 MCMC Implementation
The main idea of the estimation approach is to focus on the
posterior distribution of the parameters and the latent effects
and then to summarize this p osterior distribution by MCMC
methods. Since mu ch has been written about MCMC methods
(e.g., see Tierney 19 94; Chib and Greenberg 1995, 1996), we
can be brief.
With MCMC methods, one designs an ergodic Markov
chain with the property that the limiting invariant distribution
of the chain is the posterior density of interest. Then, draws
furnished by sampling the Markov chain, after an initial tran-
sient o r burn-in stage, can be taken as approximate correlated
draws fro m the posterior distribution. This output forms th e
basis for summarizing the posterior distribution and for com-
puting Bayesian point and interval estimates. Ergodic laws of
large numbers for Markov chains on continuous state spaces
are used to justify that these estimates are simulation consis-
tent, converging to the posterior expectations as the simulation
sample size becomes large.
One standard method for constructing a Markov chain with
the correct limiting distribution, is via a recursive simulation
of the so-called ful l conditional densities—that is, the den-
sity of a set or block of parameters, given the data and the
remaining blocks of parameters. Each of the full conditio nal
densities in the simulation is then sampled either directly (if
the full conditional density belongs to a known family of dis-
tributions) or by utilizing a techniqu e such as the Metropolis–
Hastings (M–H) method. An important and crucial point is
that these methods do not require knowled ge of the intractable
normalizing constant of the posterior distribution.
In th e present case, we apply MCMC methods to simu-
late the augmented posterior distribution of the parameters
and the latent effect. For the prior on the parameters, assume
that
4‚1 D
ƒ
1
5
independently follow the distributions
‚
N
k
4‚
0
1 B
ƒ
1
0
51 D
ƒ
1
Wishart
4
0
1 R
0
51
where
4‚
0
1 B
0
1 v
0
1 R
0
5
are known hyperparameters and Wishart
4¢1 ¢5
is the Wishart
distribution with
0
df and scale matrix
R
0
. Then, by Bayes
theorem, the posterior density is proportional to
”
J
4‚
—
‚
0
1 B
ƒ
1
0
5f
W
4D
ƒ
1
—
0
1 R
0
5
n
Y
i
D
1
p4y
i
—
‚1 b
i
5”
J
4b
i
—
0
1 D51
where
f
W
is the Wishart density. We now consider a sampling
procedure to simulate this density.
Following CGW, we co nstruct our Markov chain u sing the
blocks of parameters
8b
i
9
,
‚
, and
D
and the full conditional
distributions
6b
—
y1 ‚1 D73 6‚
—
y1 b73 6D
ƒ
1
—
b71
(4)
where
b
D
4b
1
1 : : : 1 b
n
5
. The simulation output is obtained
by recursively simulating these distributions, using the most
recent values of the conditioning variables at each step.