Debiased balanced interleaving at Amazon Search

2022
Download Copy BibTeX GitHub
Copy BibTeX
Interleaving is an online evaluation technique that has shown to be orders of magnitude more sensitive than traditional A/B tests. It presents users with a single merged result of the compared rankings and then attributes user actions back to the evaluated rankers. Different interleaving methods in the literature have their advantages and limitations with respect to unbiasedness, sensitivity, preservation of user experience, and implementation and computation complexity. We propose a new interleaving method that utilizes a counterfactual evaluation framework for credit attribution while sticking to the simple ranking merge policy of balanced interleaving, and formally derive an unbiased estimator for comparing rankers with theoretical guarantees. We then confirm the effectiveness of our method with both synthetic and real experiments. We also discuss practical considerations of bringing different interleaving methods from the literature into a large-scale experiment, and show that our method achieves a favorable tradeoff in implementation and computation complexity while preserving statistical power and reliability. We have successfully implemented our method and produced consistent conclusions at the scale of billions of search queries. We report 10 online experiments that apply our method to e-commerce search, and observe a 60x sensitivity gain over A/B tests. We also find high correlations between our proposed estimator and corresponding A/B metrics, which helps interpret interleaving results in the magnitude of A/B measurements.

Latest news

US, MA, Westborough
Amazon is looking for talented Postdoctoral Scientists to join our Fulfillment Technology and Robotics team for a one-year, full-time research position. The Innovation Lab in BOS27 is a physical space in which new ideas can be explored, hands-on. The Lab provides easier access to tools and equipment our inventors need while also incubating critical technologies necessary for future robotic products. The Lab is intended to not only develop new technologies that can be used in future Fulfillment, Technology, and Robotics products but additionally promote deeper technical collaboration with universities from around the world. The Lab’s research efforts are focused on highly autonomous systems inclusive of robotic manipulation of packages and ASINs, multi-Read more