Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch

Talukdar, Partha Pratim; Cohen, William

Computer Science > Machine Learning

arXiv:1310.2959 (cs)

[Submitted on 10 Oct 2013 (v1), last revised 27 Feb 2014 (this version, v2)]

Title:Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch

Authors:Partha Pratim Talukdar, William Cohen

View PDF

Abstract:Graph-based Semi-supervised learning (SSL) algorithms have been successfully used in a large number of applications. These methods classify initially unlabeled nodes by propagating label information over the structure of graph starting from seed nodes. Graph-based SSL algorithms usually scale linearly with the number of distinct labels (m), and require O(m) space on each node. Unfortunately, there exist many applications of practical significance with very large m over large graphs, demanding better space and time complexity. In this paper, we propose MAD-SKETCH, a novel graph-based SSL algorithm which compactly stores label distribution on each node using Count-min Sketch, a randomized data structure. We present theoretical analysis showing that under mild conditions, MAD-SKETCH can reduce space complexity at each node from O(m) to O(log m), and achieve similar savings in time complexity as well. We support our analysis through experiments on multiple real world datasets. We observe that MAD-SKETCH achieves similar performance as existing state-of-the-art graph- based SSL algorithms, while requiring smaller memory footprint and at the same time achieving up to 10x speedup. We find that MAD-SKETCH is able to scale to datasets with one million labels, which is beyond the scope of existing graph- based SSL algorithms.

Comments:	9 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1310.2959 [cs.LG]
	(or arXiv:1310.2959v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1310.2959

Submission history

From: Partha Talukdar [view email]
[v1] Thu, 10 Oct 2013 20:30:06 UTC (343 KB)
[v2] Thu, 27 Feb 2014 21:19:41 UTC (35,261 KB)

Computer Science > Machine Learning

Title:Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators