forked from KDiazOrdaz/IntroCML_workshop
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathIntro_CML_workshop.Rmd
More file actions
351 lines (259 loc) · 11.5 KB
/
Intro_CML_workshop.Rmd
File metadata and controls
351 lines (259 loc) · 11.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
---
title: "Intro_CML_workshop"
knit: (function(input_file, encoding) {
out_dir <- 'docs';
rmarkdown::render(input_file,
encoding=encoding,
output_file=file.path(dirname(input_file), out_dir, 'index.html'))})
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
The aim of this short tutorial is to demonstrate how
to implement causal machine learning estimators in practice.
We will focus on the average treatment effect (ATE)
ATE = E(Y1)-E(Y0)
We will see the naive plug-in estimator (i.e. plug-in G-computation with data-adaptive estimates), the AIPW (corresponding to one-step and estimating equations) estimator and targeted maximum likelihood estimation (TMLE)
We do this using the Super Learner in R (for the data adaptive models).
We begin by loading the necessary libraries.
```{r, include = TRUE, echo = TRUE}
#' Install packages and load them ##########
#install.packages("SuperLearner")
#install.packages("xgboost")
#install.packages("tmle")
#install.packages("devtools")
#library(devtools)
#install_github("ehkennedy/npcausal")
library(npcausal)
library(boot)
library(MASS)
library(SuperLearner)
library(survey)
library(npcausal)
library(tmle)
```
The aim of this short tutorial is to demonstrate how
to implement causal machine learning estimators in practice.
We will focus on the average treatment effect (ATE)
ATE = E(Y1)-E(Y0)
We will see the naive plug-in estimator (i.e. plug-in G-computation with data-adaptive estimates), the AIPW (corresponding to one-step and estimating equations) estimator and targeted maximum likelihood estimation (TMLE)
We do this using the Super Learner in R (for the data adaptive models).
We will use simulated data: where the Binary outcome is Y
and treatment A,
sample size n=1000
dim W =4 as variables we adjust to control for confounding.
The following code generates the data
```{r, include = TRUE, echo = TRUE}
set.seed(129)
n=1000
w1 <- rbinom(n, size=1, prob=0.5)
w2 <- rbinom(n, size=1, prob=0.65)
w3 <- round(runif(n, min=0, max=4), digits=3)
w4 <- round(runif(n, min=0, max=5), digits=3)
A <- rbinom(n, size=1,
prob= plogis(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4 + 0.15*w2*w4))
Y <- rbinom(n, size=1,
prob= plogis(-1 + A -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
Y.1 <- plogis( -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4)
Y.0 <- plogis(-1 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4)
trueATE<-mean(Y.1)-mean(Y.0)
trueATE
#Create data frame with baseline covariates
W<-data.frame(cbind(w1,w2,w3,w4))
data<-data.frame(cbind(W,A,Y))
```
## Super Learner
First, check which learners have been integrated into the SuperLearner package. We can use any of these when we run the SuperLearner:
```{r, include = TRUE, echo = TRUE}
library(SuperLearner)
listWrappers(what = "SL")
```
Here we will use the following learners (as specified in the lecture)
```{r, include = TRUE, echo = TRUE}
SL.library<- c("SL.glm", "SL.glm.interaction", "SL.xgboost", "SL.glmnet", "SL.ranger")
```
These should ideally be tested with multiple hyperparameter settings for each algorithm which can be tuned using CV.
In the interest of time, now we only use the defaults. Make sure you check which parameters are this for each learner, by typing its name and checking the default options pre-programmed in the SL wrapper, for example, for random forests using the ranger implementation
```{r, include = TRUE, echo = TRUE}
SL.ranger
```
We see that the number of trees is 500 and the number of variables to consider for each tree is the sqrt of the number of total independent variables (sqrt (dimW)) rounded down to the next lower interger.
## SL for the outcome regression (naive plug-in g-computation)
```{r, include = TRUE, echo = TRUE}
SL.outcome<- SuperLearner(Y=data$Y, X=subset(data, select=-Y),
SL.library=SL.library, family="binomial")
```
#' You can look at the Super learner object, to see how the alogorithms are weighted
```{r, include = TRUE, echo = TRUE}
SL.outcome
```
Now we get the prediction for the actual exposure level received and the two potential outcomes for everyone, based on the trained SL
```{r, include = TRUE, echo = TRUE}
SL.outcome.obs<- predict(SL.outcome, newdata=subset(data, select=-Y))$pred
# predict the PO Y^1
SL.outcome.exp<- predict(SL.outcome, newdata=data.frame(cbind(W,A=rep(1,length(A)))))$pred
# predict the PO Y^0
SL.outcome.unexp<- predict(SL.outcome, newdata=data.frame(cbind(W,A=rep(0,length(A)))))$pred
```
## SL g-computation
We can now use these two predictions to get the plug-in g-somputation
```{r, include = TRUE, echo = TRUE}
SL.plugin.gcomp<-mean(SL.outcome.exp-SL.outcome.unexp)
SL.plugin.gcomp
```
Warning: no way of doing inference, bootstrap not valid when using ML
We collate the SL fits, because we're going to use them later
```{r, include = TRUE, echo = TRUE}
Q=cbind(SL.outcome.obs, SL.outcome.unexp,SL.outcome.exp)
colnames(Q)<-c("QAW","Q0W","Q1W")
```
## plug-in AIPW
Now, we will use the outcome predictions and the propensity score predictions to estimate an AIPW with SL plog-ins.
First the SL for the prop score
```{r, include = TRUE, echo = TRUE}
SL.g<- SuperLearner(Y=data$A, X=subset(data, select=-c(A,Y)),
SL.library=SL.library, family="binomial")
```
#' You can look at the Super learner object, to see how the alogorithms are weighted
```{r, include = TRUE, echo = TRUE}
SL.g
```
We see that here all the learners have non-zero coefficients for the SL.
Now, get the probability of getting the exposure
```{r, include = TRUE, echo = TRUE}
g1W <- SL.g$SL.predict
summary(g1W)
# Look at the histogram of PS
hist(g1W)
# Look at the histogram of the weights.
hist(1/g1W)
```
For any real analysis, you must satisfy yourself that the positivity assumption holds, so that the weights are not "too" large.
Now the probability of being unexposed
```{r, include = TRUE, echo = TRUE}
g0W<- 1- g1W
```
We can now use these quantities to estimate the mean of the potential outcomes, and thus, the ATE, based on the IF shown in the lecture.
The IF for the AIPW of the Y^1 and the Y^0 can be written
```{r, include = TRUE, echo = TRUE}
IF.1<-((data$A/g1W)*(data$Y-Q[,"Q1W"])+Q[,"Q1W"])
IF.0<-(((1-data$A)/g0W)*(data$Y-Q[,"Q0W"])+Q[,"Q0W"])
#The IF of the ATE is then
IF<-IF.1-IF.0
```
We saw that the estimating eq. estimator of ATE=mean(IF)
```{r, include = TRUE, message=F, warning=FALSE, echo = TRUE}
aipw.1<-mean(IF.1);aipw.0<-mean(IF.0)
aipw.manual<-aipw.1-aipw.0
```
We now now that this estimator is asymp Normally distributed
and its variance is var(IF)/n
```{r, include = TRUE, echo = TRUE}
ci.lb<-mean(IF)-qnorm(.975)*sd(IF)/sqrt(length(IF))
ci.ub<-mean(IF)+qnorm(.975)*sd(IF)/sqrt(length(IF))
res.manual.aipw<-c(aipw.manual,ci.lb, ci.ub)
res.manual.aipw
```
### AIPW using the package npcausal
Now that you see how the concept works, you can use the npcausal package, which has pre-programed this, and other estimands.
For now, we specify no sample splitting
```{r, include = TRUE, echo = TRUE}
library(npcausal)
aipw<- ate(y=Y, a=A, x=W, nsplits=1, sl.lib=c("SL.glm", "SL.glm.interaction", "SL.glmnet", "SL.ranger"))
```
```{r, include = TRUE, echo = TRUE}
aipw$res
```
## TMLE
We now move on to the TMLE for the ATE.
Using the following code you can implement a tmle by hand, based on the clever covariate approach you saw on the first session
```{r, include = TRUE, echo = TRUE}
# First E(Y1)
#' Constructing the clever covariate
H<-as.numeric(data$A/g1W)
```
We now fit a parametric model, with the clever covariate the only explanatory variable, and using the initial outcome predictions as an offset
```{r, include = TRUE, echo = TRUE}
model<-glm(data$Y~-1+H+offset(qlogis(Q[,"QAW"])),family=binomial)
summary(model)
```
We update the initial predictions using the coefficient of the clever covariate
```{r, include = TRUE, echo = TRUE}
Q1W.1<-plogis(qlogis(Q[,"Q1W"])+coef(model)[1]/g1W)
```
And use this to get the TMLE estimate of the mean of Y^1
```{r, include = TRUE, echo = TRUE}
# Estimating E(Y1)
mean(Q1W.1)
```
We now repeat for Y^0
```{r, include = TRUE, echo = TRUE}
# E(Y0)
# Constructing the clever covariate
H<-as.numeric((1-data$A)/g0W)
# Fitting a parametric extension model
model<-glm(data$Y~-1+H+offset(qlogis(Q[,"QAW"])),family=binomial)
summary(model)
# Updating the predictions
Q0W.1<-plogis(qlogis(Q[,"Q0W"])+coef(model)[1]/g0W)
# Estimating E(Y0)
mean(Q0W.1)
```
And put together to get the TMLE for the ATE
```{r, include = TRUE, echo = TRUE}
# ATE = E(Y1)-E(Y0)
TMLE.1 =mean(Q1W.1)-mean(Q0W.1)
```
You can do all of this automatically using the tmle package, which also has coded other estimands. Other TMLE packages exists for other common estimands, such as mediation, IV regression or longitudinal settings
### TMLE using the R package
```{r, include = TRUE, echo = TRUE}
library(tmle)
TMLE<- tmle(Y=data$Y,A=data$A,W=subset(data, select=-c(A,Y)), family="binomial", Q.SL.library=SL.library, g.SL.library=SL.library)
TMLE$estimates$ATE
```
## Cross-fitting
It turns out that to remove further bias, while avoiding extra assumptions, we should use sample splitting. Even better, we should use cross-fitting.
This can be done relatively easily in the npcausal package
```{r, include = T, echo = TRUE}
aipw.2<- ate(y=Y, a=A, x=W, nsplits=10, sl.lib=c("SL.glm", "SL.glm.interaction", "SL.glmnet", "SL.ranger"))
```
```{r, include = TRUE, echo = TRUE}
aipw.2$res
```
You should also check tmle3, the newest implmentation of TMLE, where the default option is to fit a CV-TMLE
https://tlverse.org/tlverse-handbook/tmle3.html
Remember when doing your own analyses, to tune your learners.
To learn how to do this using the SL, visit https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html
# Further reading
## Causal machine learning
For more general reading on debiased machine learning and tmle, see
* van der Laan, M. J. and Rose, S. (2011). Targeted Learning. Springer Series in Statistics. Springer New York,
New York, NY
* Chernozhukov, V., Chetverikov, D., Demirer, M., Dufflo, E., Hansen, C., Newey, W., and Robins, J. (2018).
Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1{
C68.
## Influence functions
* Fisher, A., & Kennedy, E. H. (2020). Visually communicating and
teaching intuition for influence functions. The American Statistician,
1-11.
* Levy, J. (2019).
Tutorial: Deriving The Efficient Influence Curve for Large Models.
https://arxiv.org/abs/1903.01706
* Ichimura, H. and Newey, W. K. (2015).
The influence function of semiparametric estimators. arXiv preprint
arXiv:1508.01378
## R Software packages
* Kennedy, Edward. Package npcausal https://github.com/ehkennedy/npcausal
* Polley, E. C. and van der Laan, M. J. (2010) Super learner in prediction.
U.C. Berkeley Division of Biostatistics Working Paper Series Working
Paper 266. URL http://biostats.bepress.com/ucbbiostat/paper266.
* Naimi A and Balzer L. Stacked generalization: an introduction to super
learning. European Journal of Epidemiology (2018) 33:459–464
* Gruber S, van der Laan MJ (2012). tmle: An R Package for Targeted
Maximum Likelihood Estimation.” Journal of Statistical Software, 51(13),
1–35. doi:10.18637/jss.v051.i13, http://www.jstatsoft.org/v51/i13/.
* Targeted learning software tlverse tutorials “The Hitchhiker’s Guide to the
tlverse: A Targeted Learning Practitioner’s Handbook.”
tmle3 newest implementation of TMLE, default version is to fit a CV-TMLE
https://tlverse.org/tlverse-handbook/tmle3.html
* new superlearner sl3 https://tlverse.org/tlverse-handbook/sl3.html