[Stats] How do you “bootstrap”? II

5 min readJan 27, 2022

Hello, so after the last post, this will cover how you should actually conduct bootstrap. Although there has been a lot of papers regarding bootstrap for decades, I found there are few resources for students presumably in masters’ or first year PhD. Hope this helps you on your journey.

I. Golden Rules

These golden rules are copied and pasted from ETM and I think these rules are something you want to stick to.

(GR1) The Bootstrap DGP must belong the the model that represents the null hypothesis.

This means if

then, your bootstrap DGP should look like

where \tilde{\beta}’s represent H0 coefficient estimates. To be concrete you have to estimate again as if y is really explained by constant term and x1.

(GR2) Unless the test statistic is pivotal for the null model, the bootstrap DGP should be as good an estimate of the true DGP as possible, under the assumption that the true DGP belongs to the null model.

This means be as efficient as possible. So, if heteroskedasticity is doubted, use the correction method available.

Now, what is u^*?

II. Resampling

Theoretical back up for resampling or any bootstrap method is related to the Fundamental Theorem of Statistics.

It sounds like a big deal, but it is fundamental because you’d already know it. So, the theorem says that empirical distribution function is a consistent estimator of CDF of the random variable. (see ETM p 147) It can be proved using law of large numbers and closely related to the idea of CLT (central limit theorem).

Now, knowing that residuals are consistent estimator of disturbances, the EDF of residuals will converge to CDF of disturbances. So, resampling the residuals (or shuffling them if you are more familiar with that idea) is actually or conceptually not that off.

Construct Bootstrap DGP

Discussion up to now will allow us to construct the Bootstrap DGP (y^*).

#construct H0 model
Z=np.column_stack((iota,x_1))
gamma_hat=inv(Z.T@Z)@Z.T@y #run H0
M_z=iden-Z@inv(Z.T@Z)@Z.T
u_hat_r=M_z@y# This function takes any residual vector!!def resample_DGP(residual):
    u_boot=np.random.choice(residual.flatten(),n,replace=True)  .reshape(n,1)
     y_resample=np.add(Z@gamma_hat,u_boot)
     return y_resample

So in the function of resample_DGP, you have to feed a residual vector. What are you gonna feed? It should be restricted residual by running H0 model; u_hat_r=M_z@y, following the golden rule 1.

Bootstrap testing

Then, we have to test! Testing is done by first, calculate Wald statistic and compare it with what we got from asymptotic test and count the frequency of bootstrap test statistic being more ‘extreme’ than our test statistic.

The idea of ‘extreme’ will differ by whether you are looking at one-tailed or two-tailed test. This probability number will be bootstrap p-value.

But how do you calculate Wald statistic or what to feed the function p()? Coming back to how we did in asymptotic testing;

def p(var,X,y):
   var_hat=var
   beta_hat=coef(X,y)
   Wald=(R@beta_hat).T@inv(R@var_hat@R.T)@(R@beta_hat)
   pvalue=1-(chi2.cdf(Wald,2))
   return np.array((Wald,pvalue)).reshape(2)var_ols=fvar_ols(u_hat_un)
p(var_ols,X,y)

So, after constructing y^*, just treat our y^*as if it is a new y.

Things to be done.

calculate unrestricted residual of y^*: u_hat_DGP=M_x@y_resample
calculate variance you want to use (either ols, HC0, HC2): ex) fvar_ols(u_hat_DGP)
Calculate Wald statistic: p(fvar_ols(u_hat_DGP), X, y_resample)[0]
Compare it to test statistic we already calculated: ‘

test_hat=np.array((p(var_ols,X,y)[0],p(HC0,X,y)[0],p(HC2,X,y)[0]),dtype=float).reshape(3)

Since we don’t want to keep the pvalues for asymptotic tests, I will define a new function to get the wald statistic.

def test(var_hat,X,y):
    beta_hat=coef(X,y)
    t=(R@beta_hat).T@inv(R@var_hat@R.T)@(R@beta_hat)
    return t

Then, these should follow

def resample_ols(B):
    p=0
    for i in range(B):
        y_resample=resample_DGP(u_hat_r)
        w=test(fvar_ols(M_x@y_resample),X,y_resample)
        I=bool(w>test_hat[0])
        p=p+I
    return p/B

here, everything should remain same except the function that you’re using e.g.) fHC0 or fHC2 and accordingly to which you’re comparing e.g.)test_hat[1] or test_hat[2]

Also, B is number of bootstrap you’d like to experiment. From the legacy of Monte Carlo, it is common to make;

III. Wild Bootstrap

Wild bootstrap has been suggested to fix the problem of heteroskedasticity. Since we are suspecting (virtually) it be be happening, we might want to use this method.

Wild bootstrap corrects the bootstrap residual, which was resampled restricted residual previously, by multiplying restricted residual by a random drawing from iid distribution with expectation zero and variance of one.

In fact, there are so many suggestions which distribution we should be following Mammen’s suggestion.

(For more info on wild bootstrap I found the wildbootstrap tamed at last(2008, Davidson, Falchaire) easy to read and detailed)

And in wild bootstrap, our Bootstrap DGP would be

where, s_t^* will follow mammen’s two point distribution.

Actually, this number can be a random drawing from N(0,1) too.

from math import sqrtet=[-((sqrt(5)-1)/2), (sqrt(5)+1)/2]
prob=(sqrt(5)+1)/(2*sqrt(5))print(et[0]*prob+et[1]*(1-prob), (et[0]**2)*prob+(et[1]**2)*(1-prob)) # expectation zero variance 1def wild_DGP(residual):
    st=np.random.choice(et,n,p=[prob,1-prob]).reshape(n,1)
    u_boot=np.multiply(st,residual)
    y_wild=np.add(Z@gamma_hat,u_boot) #gamma_hat from running H0
    return y_wild

Once bootstrap DGP is constructed, what follows is exactly the same.

def wild_ols(B):
    p=0
    for i in range(B):
        y_wild=wild_DGP(u_hat_r)
        w=test(fvar_ols(M_x@y_wild),X,y_wild)
        I=bool(w>test_hat[0])
        p=p+I
    return p/B

IV. Conclusion

My intuition from a semester working with bootstrap (in basic level) is that it gives us a room for simulation as if we have more datapoints.

In ML, we do training we shuffle through what we have and let the machine to learn from it. I think it is analogous of having bootstrap DGP. Even in the era of ML, bootstrap has its meaning it can be applied when we have not that much data (compared to how much we usually have in ML).

I hope this post helped you how in practice bootstrapping is done.

Thank you.

Cecilia Kim.