AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Rajbhandari, Samyam; Shrivastava, Harsh; He, Yuxiong

Computer Science > Machine Learning

arXiv:1910.01740 (cs)

[Submitted on 2 Oct 2019]

Title:AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Authors:Samyam Rajbhandari, Harsh Shrivastava, Yuxiong He

View PDF

Abstract:Wide adoption of complex RNN based models is hindered by their inference performance, cost and memory requirements. To address this issue, we develop AntMan, combining structured sparsity with low-rank decomposition synergistically, to reduce model computation, size and execution time of RNNs while attaining desired accuracy. AntMan extends knowledge distillation based training to learn the compressed models efficiently. Our evaluation shows that AntMan offers up to 100x computation reduction with less than 1pt accuracy drop for language and machine reading comprehension models. Our evaluation also shows that for a given accuracy target, AntMan produces 5x smaller models than the state-of-art. Lastly, we show that AntMan offers super-linear speed gains compared to theoretical speedup, demonstrating its practical value on commodity hardware.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.01740 [cs.LG]
	(or arXiv:1910.01740v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.01740

Submission history

From: Harsh Shrivastava [view email]
[v1] Wed, 2 Oct 2019 17:31:09 UTC (1,431 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Samyam Rajbhandari
Harsh Shrivastava
Yuxiong He

export BibTeX citation

Computer Science > Machine Learning

Title:AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators