Double machine learning at scale to predict causal impact of customer actions
2023
Causal Impact (CI) measurement is broadly used across the industry to inform both short- and long-term investment decisions of various types. In this paper, we apply the double machine learning (DML) methodology to estimate average and conditional average treatment effects across 100s of customer action types for e-commerce and digital businesses and 100s of millions of customers that can be used in decisions supporting those businesses. We operationalize DML through a causal machine learning library. It uses distributed computation on Spark and is configured via a flexible, JSON-driven model configuration approach to estimate causal impacts at scale (i.e., across hundred of actions and millions of customers). We outline the DML methodology and implementation. We show examples of average treatment effect and conditional average treatment effect (i.e., customer-level) estimates values along with confidence intervals. Our validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain in the computational time. Our contribution is to advance the scalable application of CI, while also providing an interface that allows faster experimentation, ability to onboard new use cases, and improved accessibility of underlying code for partner teams.
Research areas