EM算法java实现_EM算法java实现资源-CSDN文库

共8个文件

java：5个

jar：1个

doc：1个

java

源码

5星 · 超过95%的资源需积分: 9 119 浏览量 2013-02-21 15:26:30 上传评论 1 收藏 2.05MB RAR 举报

EM算法，全称为 Expectation-Maximization（期望-最大化）算法，是一种在统计学和机器学习领域广泛应用的迭代方法，主要用于处理含有隐藏变量的概率模型的参数估计问题。它通过不断迭代两个步骤——期望（E）步骤和最大化（M）步骤，来逐步优化模型的参数，使得数据对模型的似然性最大化。在Java环境下实现EM算法，首先需要理解算法的基本原理。EM算法通常用于解决最大似然估计问题，当数据存在未观测变量时，通过迭代求解模型的参数。算法主要包括以下两步： 1. **期望（E）步骤**：在当前参数估计下，计算每个观测数据点对隐藏变量的条件概率，即后验概率分布。这一步骤将隐藏变量的不确定性纳入考虑，形成一个期望值。 2. **最大化（M）步骤**：在E步骤得到的后验概率分布下，重新估计模型参数，以最大化观测数据的对数似然函数。这一步骤通过优化参数使得模型对已知数据的解释能力达到最强。在Java中实现EM算法，你需要考虑以下几个关键点： - **数据结构**：定义数据结构以存储观测数据和隐藏变量，以及模型的参数。例如，可以创建类`Observation`表示观测数据，类`HiddenVariable`表示隐藏变量，类`ModelParameters`表示模型参数。 - **初始化**：设置初始参数，如随机选择或基于先验知识设定。 - **E步骤**：根据当前参数，计算每个观测数据点的后验概率。这通常涉及矩阵运算和概率计算，可以使用Java的数学库如Apache Commons Math或者你自己实现。 - **M步骤**：基于E步骤得到的后验概率，更新模型参数。这通常涉及到求解优化问题，可能需要用到梯度上升或牛顿法等优化算法。 - **迭代与终止条件**：不断执行E和M步骤，直到模型参数收敛或者达到预设的最大迭代次数。判断收敛一般通过比较两次迭代间参数的变化量或者似然函数的增益。 - **评估与验证**：在训练完成后，可以使用交叉验证或测试集评估模型的性能，例如计算对数似然函数、AIC（赤池信息准则）或BIC（贝叶斯信息准则）。在EM-java这个项目中，你可能会找到一个完整的EM算法实现，包括源代码、测试用例以及可能的示例数据。通过阅读源码，你可以深入理解算法的具体实现细节，如何处理不同的数据类型和模型，以及如何进行优化和调试。此外，该项目可能还提供了详细的文档或注释，帮助你更好地理解和使用这个实现。 EM算法在处理有隐藏变量的模型时具有强大的能力，其Java实现则可以帮助我们在实际应用中灵活地部署和调整模型。通过深入学习和实践，我们可以掌握这一重要的统计学习工具。

资源推荐

资源详情

资源评论

收起资源包目录

EM-java.rar （8个子文件）

EM-java

runnable jar file

EM.jar 2MB

the paper of EM algorithm

The EM Algorithm.pdf 293KB

src

EmAlgorithm.java 3KB

Sub.java 395B

EMGUI.java 417B

CountHT.java 426B

Display.java 3KB

the report of this program

课程报告.doc 83KB

The EM Algorithm

Michael Collins

In fulllment of the Written Preliminary Exam II requirement, September 1997

Contents

1 Intro duction 3

2 Preliminaries 4

2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Maximum-likeliho o d Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Sucient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Exponential Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4.1 An example: the normal distribution . . . . . . . . . . . . . . . . . . . . . . . 6

2.4.2 Other imp ortant prop erties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 The EM algorithm 7

3.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Proof that

(



) is non-decreasing at each iteration . . . . . . . . . . . . . . . . . . . 9

3.2.1 Proof of equation 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.2 Proof of equation 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Proof that

(



) is

increasing



is not a stationary point of

. . . . . . . . . . 11

3.4 Generalised EM (GEM) algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.5 Special Cases of the EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.5.1 Exponential Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5.2 Algebraic Mo dels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.6 Summary of the 4 Theorems in

DLR

. . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6.1 Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6.2 Theorems 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6.3 Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 (Wu 83)'s Commentary on the EM algorithm 18

4.1 Is



a global maximum, lo cal maximum or stationary value? . . . . . . . . . . . . . 19

4.1.1 Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.2 Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.3 Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.4 Summary of Theorems 1, 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.5 Example of Convergence to a Saddle Point . . . . . . . . . . . . . . . . . . . 20

4.1.6 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.7 Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Does



Converge to a p oint





? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2.1 Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2.2 Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 The Non-convergent GEM Algorithm given in (Boyles 83) . . . . . . . . . . . . . . . 22

5 (Jamshidian and Jennrich 93) 22

5.1 Optimisation of Quadratic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.1.1 Conjugate Gradient Metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1.2 Generalised Conjugate Gradient Metho ds . . . . . . . . . . . . . . . . . . . . 24

5.2 Accelerating EM using Generalised Conjugate Gradients . . . . . . . . . . . . . . . . 24

5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Conclusions 25

1 Intro duction

The Exp ectation Maximization (EM) algorithm is a parameter estimation method which falls into

the general framework of maximum-likeliho o d estimation, and is applied in cases where part of the

data can b e considered to be incomplete, or \hidden". It is essentially an iterative optimisation

algorithm which, at least under certain conditions, will converge to parameter values at a lo cal

maximum of the likelihoo d function. There are many statistical mo dels which turn out to be sp ecial

cases of EM, for example: Hidden Markov Models (HMMs) (Baum 71); the generalisation of HMMs

to Stochastic Context-Free Grammars (Baker 79); mixture models; and estimation in cases of missing

data.

(Dempster, Laird and Rubin) (from here on referred to as

DLR

)) dened the EM algorithm,

and proved certain prop erties, in particular that at each iteration the log-likelihoo d of the observed

data is guaranteed to b e non-decreasing. That is, if

(



) is the likelihoo d of the observed data

given parameter values



, and



are the parameter values at the

'th and

+ 1'th iterations

respectively, then

(



)



(



). They also dened Generalised EM (GEM) algorithms, which

include EM as a sp ecial case, and can be more computationally ecient, while still guaranteeing

that

(



)



(



(Wu 1983) addressed two issues:

1. Given that

converges to some value



, then is



a global maximum, local maximum,

saddle p oint or some other p oint? It is well known that



can not, in general, b e guaranteed

to b e a global maximum.

(



)



(



) is one condition for convergence to a stationary

point of

, (Wu 83) denes additional conditions for convergence of an EM or GEM algorithm

to a stationary point. At least for EM algorithms, these conditions are quite mild. He also

gave a condition for convergence to a local maximum as opposed to a saddle p oint, but this

condition is dicult to verify in practice (and does not hold in many practical applications).

2. Under what conditions do the parameter estimates



also converge to some p oint





? Con-

vergence of

to a p oint



does not guarantee convergence of the parameter estimates to

some





, particularly if there is more than one p oint



satisfying

(



) =



(JJ 93) emphasise that EM is an optimisation algorithm for

, and show that it is approximately

a steep est descent algorithm, an optimisation metho d which often converges slowly. They show that

with a relatively minor increase in complexity the EM algorithm can b e mo died to a conjugate-

gradient descent method, which is known to be an improved optimisation algorithm. They give

experimental results showing that their algorithm typically converges around 3-10 times faster than

standard EM, and can in some cases be 25-100 times faster.

The remainder of this pap er gives some background about maximum-likelihood estimation in

section 2; considers the ma jor results of

DLR

, (Wu 83) and (JJ 77) in sections 3, 4 and 5; and

concludes in section 6. For a summary of the ma jor p oints of this paper the reader should refer at

this point to the bullet p oints in section 6.

2 Preliminaries

Most of the results in this section are taken from [BD 77].

2.1 Notation

We use bold-face throughout to denote matrices, normal typeface to denote scalars. Given a vector

, we write its

'th comp onent as

. We use the

operator to denote dierentiation. Where there

is ambiguity regarding which variable dierentiation is with resp ect to, we use sup erscripts on the

operator. For example,

(



;



) is the rst derivative of

w.r.t.



(



;



) is

the rst derivative w.r.t.



2.2 Maximum-likeliho o d Estimation

In general we have



sample

; X

; :::X

where each

is a random variable (a single value, or vector of

values).



A vector of

parameters



such that we can dene the

likelihood

of the data

(



). We

can also dene the

log-likelihood

(



) = log

(



). Often the

s are independently

identically distributed (i.i.d.) so that

(



) =

:::n

log

(



If  is the parameter space, maximum-likeliho o d (ML) estimation involves setting the ML esti-

mate



M L

such that



M L

= arg max





(



) (1)

2.2.1 An example

Suppose we toss a coin 6 times, and

= 1 if the i'th toss is heads, 0 if it is tails. Say our sample

;

. Assume the coin has a probability

of b eing heads, 1

of b eing tails, so that



. Then

(



) =

log

(

))

= 2 log

+ 4 log(1

) (2)

We can maximize

by setting the derivative w.r.t.

equal to 0:

d L

(



)

d p

= 0 (3)

Solving this gives

, which is the \intuitive" estimate for p, the proportion of heads which have

been seen in the sample.

Another common example of maximum-likeliho o d estimation is when the comp onents of

are

drawn i.i.d. from a normal distribution with unknown mean



and known variance



. It's simple

enough to prove that the ML estimate for



, i.e., the sample mean.

2.3 Sucient Statistics

A statistic

(

) is any real or vector-valued function of the data

. Note that if

(

) =

(

)

for two samples

and

such that

then

reduces the data, by mapping dierent

samples to the same value.

sucient

if there are functions

(

)

;



) and

(

) s.t.

(



) =

(

)

;



)

(

) (4)

Typically,

(

)

;



) =

(

)



) and

(

) =

(

). The crucial p oint is that when maxi-

mizing

(



) w.r.t.



we can simply maximize

(

)

;



), so the sucient statistics

summarise

the data { for ML estimation, once we know

we don't need to know anything else ab out the data.

2.3.1 An example

For the coin-tossing example, if the sample size is

and the number of heads in the sample is

then

(



) =

)

(

)

(5)

= (

; n

) is sucient.

2.4 Exp onential Families

An important class of distributions is the exp onential family, where the likelihoo d can be written

(



) =

exp[

(



)

(

) +

(



) +

(

)]

(

) (6)

is the indicator function over the set

, and

cannot depend on



. Note that

(

) =

(

)

; T

(

)

:::T

(

)

is sucient.

If we dene the parameters



;



; :::



such that

(



) = 

then these are called

the

natural

parameters. This can be a useful simplication, for example if when maximizing

dierentiate w.r.t.



, where for the natural parameters the derivative is then a simple function

involving

评论收藏

内容反馈

whdx666666

粉丝: 3
资源: 119

EM算法java实现

EM算法实现

机器学习算法的java实现

EM算法(附实验报告文档,java实现)

java版本的EM

EM算法求高斯混合分布java代码

EM抛硬币算法

EM算法，能够比较准确的实现对二维数据的分类，已测试通过，完全可以运行

Em算法实现聚类(VC++实现)

EM算法原理及实现EM算法原理及实现

EM 算法实现

EM算法：机器学习之EM算法实现

GMM算法的 EM实现

Matlab实现EM算法

GMM算法的EM实现

EM.java.tar.gz_EM算法_MLE_MLE EM_em algorithm java_求MLE

Kmeams与EM算法的java版本

RSA加解密算法java实现

Java实现对Weka算法的应用案例

期望最大化算法java数据挖掘算法源码

EM算法的原理例子

EM聚类算法，详细介绍了数据挖掘中的EM算法

EM_GMM:从零开始对高斯混合物的EM算法的实现

GMM算法的EM实现 2分类

MATLAB实现MAP EM算法全

EM算法的matlab实现

EM算法简介及实现_课件

EM.rar_EM_EM java_EM java

hmm.rar_EM HMM_HMM_EM_em算法 hmm_hmm em实现_hmm程序

数据挖掘K-Means聚类算法Java实现.zip

最新资源