Efficient Document Indexing Using Pivot Tree

Singh, Gaurav; Piwowarski, Benjamin

Computer Science > Information Retrieval

arXiv:1605.06693 (cs)

[Submitted on 21 May 2016]

Title:Efficient Document Indexing Using Pivot Tree

Authors:Gaurav Singh, Benjamin Piwowarski

View PDF

Abstract:We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bag-of-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a metric distance measure as it doesn't follow triangle inequality, therefore most metric searching methods can not be applied directly. We propose an efficient method for indexing documents using a pivot tree that leads to efficient retrieval. We also study the relation between precision and efficiency for the proposed method and compare it with a state of the art in the area of document searching based on inner product.

Comments:	6 Pages, 2 Figures
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:1605.06693 [cs.IR]
	(or arXiv:1605.06693v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1605.06693

Submission history

From: Gaurav Singh [view email]
[v1] Sat, 21 May 2016 19:55:03 UTC (76 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2016-05

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Gaurav Singh
Benjamin Piwowarski

export BibTeX citation

Computer Science > Information Retrieval

Title:Efficient Document Indexing Using Pivot Tree

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Efficient Document Indexing Using Pivot Tree

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators