Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

Hoyle, Alexander; Goel, Pranav; Peskov, Denis; Hian-Cheong, Andrew; Boyd-Graber, Jordan; Resnik, Philip

Computer Science > Computation and Language

arXiv:2107.02173 (cs)

[Submitted on 5 Jul 2021 (v1), last revised 27 Oct 2021 (this version, v3)]

Title:Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

Authors:Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, Philip Resnik

View PDF

Abstract:Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.

Comments:	Accepted to NeurIPS 2021 (spotlight presentation). CR version
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2107.02173 [cs.CL]
	(or arXiv:2107.02173v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2107.02173

Submission history

From: Alexander Hoyle [view email]
[v1] Mon, 5 Jul 2021 17:58:52 UTC (2,136 KB)
[v2] Mon, 4 Oct 2021 23:50:01 UTC (2,524 KB)
[v3] Wed, 27 Oct 2021 21:37:06 UTC (2,144 KB)

Computer Science > Computation and Language

Title:Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators