Title: Algorithmic Advances for the Design and Analysis of Randomized Experiments
Advisors: Daniel Spielman and Amin Karbasi
Other committee members:
Randomized experiments are the gold standard for establishing the causal effect of a treatment on a population. In this dissertation, I present algorithmic advances for three different problems arising in the design and analysis of randomized experiments: covariate balancing, variance estimation, and bipartite experiments.
In the first part of the defense, we describe an inherent trade-off between covariate balancing and robustness, which is then formulated as a distributional discrepancy problem. In order to navigate this trade-off, we present the Gram–Schmidt Walk Design which is based on the recent discrepancy algorithm of Bansal, Dadush, Garg, and Lovett (2019). By tightening the algorithmic analysis, we derive bounds on the mean squared error of the Horvitz–Thompson estimator under this design in terms of a ridge regression of the outcomes on the covariates, which we interpret as regression by design. We carry out further analysis including tail bounds on the effect estimator, methods for constructing confidence intervals, and a kernel method extension of the design which accommodates non-linear responses.
In the second part of the dense, we study the problem of estimating the variance of treatment effect estimators under interference. It is well-known that due to the fundamental problem of causal inference, unbiased variance estimation is impossible without strong assumption on the outcomes. Thus, we study a class of conservative estimators which are based on variance bounds. We identify conditions under which the variance bounds themselves are admissible and provide a general algorithmic framework to construct admissible variance bounds, according to the experimenter’s preferences and prior substantive knowledge.
Time permitting, we will present methodology for the newly proposed bipartite experimental framework, where units which receive treatment are distinct from units on which outcomes are measured, and the two are connected via a bipartite graph. We propose the Exposure Re-weighted Linear (ERL) estimator which we show is unbiased in finite samples and consistent and asymptotically normal in large samples provided the bipartite graph is sufficiently sparse. We provide an variance estimator which facilitates confidence intervals based on the normal approximation. Finally, we present Exposure-Design, a correlation clustering based design for improving precision of the ERL estimator.