Statistical Distortion: Consequences of Data Cleaning

Dasu, Tamraparni; Loh, Ji Meng

Computer Science > Databases

arXiv:1208.1932 (cs)

[Submitted on 9 Aug 2012]

Title:Statistical Distortion: Consequences of Data Cleaning

Authors:Tamraparni Dasu, Ji Meng Loh

View PDF

Abstract:We introduce the notion of statistical distortion as an essential metric for measuring the effectiveness of data cleaning strategies. We use this metric to propose a widely applicable yet scalable experimental framework for evaluating data cleaning strategies along three dimensions: glitch improvement, statistical distortion and cost-related criteria. Existing metrics focus on glitch improvement and cost, but not on the statistical impact of data cleaning strategies. We illustrate our framework on real world data, with a comprehensive suite of experiments and analyses.

Comments:	VLDB2012
Subjects:	Databases (cs.DB)
Cite as:	arXiv:1208.1932 [cs.DB]
	(or arXiv:1208.1932v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1208.1932
Journal reference:	Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1674-1683 (2012)

Submission history

From: Tamraparni Dasu [view email] [via Ahmet Sacan as proxy]
[v1] Thu, 9 Aug 2012 14:52:19 UTC (547 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DB

< prev | next >

new | recent | 2012-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tamraparni Dasu
Ji Meng Loh

export BibTeX citation

Computer Science > Databases

Title:Statistical Distortion: Consequences of Data Cleaning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Statistical Distortion: Consequences of Data Cleaning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators