Linear Bandits with Stochastic Delayed Feedback

Vernade, Claire; Carpentier, Alexandra; Lattimore, Tor; Zappella, Giovanni; Ermis, Beyza; Brueckner, Michael

Statistics > Machine Learning

arXiv:1807.02089 (stat)

[Submitted on 5 Jul 2018 (v1), last revised 2 Mar 2020 (this version, v3)]

Title:Linear Bandits with Stochastic Delayed Feedback

Authors:Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

View PDF

Abstract:Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners hoping to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase is usually observable some time after the display, the decision of not buying is never explicitly sent to the system. In other words, the learner only observes delayed positive events. We formalize this problem as a novel stochastic delayed linear bandit and propose ${\tt OTFLinUCB}$ and ${\tt OTFLinTS}$, two computationally efficient algorithms able to integrate new information as it becomes available and to deal with the permanently censored feedback. We prove optimal $\tilde O(\smash{d\sqrt{T}})$ bounds on the regret of the first algorithm and study the dependency on delay-dependent parameters. Our model, assumptions and results are validated by experiments on simulated and real data.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1807.02089 [stat.ML]
	(or arXiv:1807.02089v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1807.02089

Submission history

From: Claire Vernade [view email]
[v1] Thu, 5 Jul 2018 17:09:33 UTC (792 KB)
[v2] Fri, 21 Feb 2020 17:09:21 UTC (96 KB)
[v3] Mon, 2 Mar 2020 14:19:17 UTC (96 KB)

Statistics > Machine Learning

Title:Linear Bandits with Stochastic Delayed Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Linear Bandits with Stochastic Delayed Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators