[Stats] How do you “Bootstrap”?

A Ydobon
7 min readDec 15, 2021

Hello, everyone. It has been a while since I left medium. There has been some changes in my life and now I’m doing PhD in Economics at Mcgill university, Montreal.

That was just a little update on my private life. In this series on bootstrap (believe it would be two or max three), I will try to guide you to how to really implement bootstrap with codes. I know that some programs do have embedded functions to do this but here, I will really do one by one so that you can understand how it works. It is quite confusing what to use in bootstrap.

Before moving on, this story is based on our assignment in Econometrics course (ECO662D1) on bootstrap. I want to thank professor Russell Davidson for letting me to put his assignment online. (It was a really enjoyable semester to be able to learn things from him). FYI, the blue ETM (Econometric Theory and Methods) is now available free to download. If you are interested, just google it and get the most out of the book.

I. Bootstrap?

Since English is not my first language, the expression to “pull yourself up by your bootstraps” is kind of new to me. Although there can be many other implications, if you are stuck in the mud, you can pull yourself out of it by pulling your own bootstrap.

So, in statistics/econometrics, it means that you can improve your inference only with what you have (data/information) now. But how you do that? My impression (not to be proven) is that, for example, when we do ML, it is ‘easier’ to handle with simulated data, since it means you can have infinite number of data if it is necessary. Similarly, bootstrap kind of opens up a room for simulation — “construct bootstrap DGP (data generating process)”. In that sense, if we do follow some rules that can be justified theoretically, we can imagine that it would work better that stick to what we have.

There are some theoretical justifications that bootstrap resides on. However, that is not the purpose of this story and there are so many books that cover it. (consult p155–166 of ETM)

Now, knowing that Bootstrap works and let’s see how we can do it.

II. Inference v.s. Estimation

One thing to note is that bootstrapping is about inference not estimation. It is quite embarrassing but, it has not been long that I was able to differentiate inference and estimation. Estimation is for example getting;

then, what should happen next is ‘inference’. For example if our estimation was

then, a sensible researcher would want to next test whether the first two elements are equal to 0 — ‘inference’ .

III. Asymptotic Hypothesis test

Before we do bootstrapping, we will do asymptotic test — this test statistic will be needed in the bootstrapping. First, our DGP is as follows;

where we only know that expectation of disturbances (u) is 0 but its covariance matrix is diagonal but the diagonal elements can be different. In short, we are suspecting heteroskedasticity. FYI, we changed terminology ‘error’ to ‘disturbances’, being consistent with ETM.

And our null hypothesis is;

  1. What test statistic are we using?

First, we will be using Wald statistic. Given that we are suspecting heteroskedasticity, to make sure our testing is as efficient as possible, we will use Wald statistic that allows heterosketadisticiy correction.

In our specific case,

so that the Wald statistic will asymptotically follow chi-square with degrees of freedom 2.

Also, one thing to notice is that by premultiplying R^T and post-multiplying R, it is just picking sub matrix for beta2 and beta3.

2. Heteroskedastic disturbances?

Through out the semester, we started with Classical Normal Linear Model,

where ID implies ‘identically and independently follows normal distribution’ and regression is a linear function. Under CNLM, Wald statistic exactly follows chi-square distribution.

Then, if we loosen ‘normal’ part, it is also known as white noise. Even in this case, as mentioned, the test statistic follows chi-square asymptotically.

Finally, if each disturbances has different variances, this case is called ‘heteroskedasticity’.

If we go further, we say disturbances are serially correlated and heteroskedastic. In that case we should use HAC(heteroskedastic and autocorrelated consistent) estimators.

However, since neither studies with HAC estimators in asymptotic theory nor the bootstrap counterparts are not yet final (meaning that not a single approach dominates the others), we will be focusing on heteroskedasticity correction.

3. HCCME (Heteroskedastic Consistent Covariance Matrix Estimator)

Hal White (1980) suggested HC0 that is asymptotically consistent estimator for the covariance matrix of estimates of coefficients where regression has disturbances that are heteroskedastic. (p196–202 of ETM)

There exist HC0 to HC3 depending on corrections or trials to improve efficiency. However, they are similar in terms of ‘coding’, and I will focus on HC0 and HC2. It is your choice actually.

The form of HCCME is always the same and the only difference from HC0 to HC3 is what it takes as \hat{\Omega}.

FYI, it took me a while to understand although there is hat above Ω, it is not a consistent estimator of covariance matrix of disturbances even though expectation of squared residuals consistently estimate variances of each disturbances.

By construction, it cannot be a consistent estimator since as n goes to infinity, it grows in size whereas the covariance matrix of disturbances are fixed in nxn matrix. For each elements, we can apply law of large numbers but not to the whole matrix.

However, White succeeded finding a consistent estimator for covariance matrix of beta hat which is kxk matrix.

IV. Some codings

We will be doing 3 asymptotic tests under assumption that disturbances are homoskedastic and then using 2 versions of HCCME.

1. Estimation

beta_hat=inv(X.T@X)@X.T@yarray([[1.31986603e+02],
[9.49700802e+00],
[5.09767067e-02],
[4.56077098e-01]])

So, as said we will test whether the last two are equal to 0 or not. Actually, it is reasonable thing to test.

2. Construct covariance matrix

First, we will disregard our doubt on heteroskedasticity, and stick to usual

So, before anything, we need residuals or u hat. There can be many ways to calculate residuals however, using complementary projection matrix is the most concise. The idea is to draw perpendicular line from y to the subspace that column vectors of X’s create. For more on this ‘geometry’ analysis on OLS consult chapter 2 of ETM.

iden=np.identity(100,dtype=float)
M_x=iden-X@inv(X.T@X)@X.T
u_hat_un=M_x@y

Then, we can define some functions to calculate covariance matrices.

def fvar_ols(u_hat_DGP):
s2=(u_hat_DGP.T@u_hat_DGP)/(n-k)
var_ols=s2*inv(X.T@X)
return var_ols
def fHC0(u_hat_DGP):
u_sq=np.square(u_hat_DGP)
omega0=np.diagflat(u_sq)
HC0=inv(X.T@X)@(X.T@omega0@X)@inv(X.T@X)
return HC0
P_x=X@inv(X.T@X)@X.T
ht=np.diag(P_x)
def fHC2(u_hat_DGP):
omega2_diag=[0 for i in range(n)
u_hat_DGP_sq=np.square(u_hat_DGP)
for i in range(n):
omega2_diag[i]=u_hat_DGP_sq[i]/(1-ht[i])
omega2=np.diagflat(omega2_diag)
HC2=inv(X.T@X)@(X.T@omega2@X)@inv(X.T@X)
return HC2

3. Wald Statistic and P values

Now we can finalize our asymptotic test by calculating p values. As said, the wald statistic will follow chi-squared distribution with df 2.

R=np.matrix([[0,0,1,0],[0,0,0,1]])def coef(X,y):
beta_hat=inv(X.T@X)@X.T@y
return beta_hat
from scipy.stats import chi2
def p(var,X,y):
var_hat=var
beta_hat=coef(X,y)
Wald=(R@beta_hat).T@inv(R@var_hat@R.T)@(R@beta_hat)
pvalue=1-(chi2.cdf(Wald,2))
return np.array((Wald,pvalue)).reshape(2)

Results

V. Conclusion

Today, we have not talked much about Bootstrap. However, any functions that we need are almost constructed in this part. Also, remember that the Wald statistics that we have will be needed in Bootstrap. (you have to save the numbers as variables)

test_hat=np.array((p(var_ols,X,y)[0],p(HC0,X,y)[0],p(HC2,X,y)[0]),dtype=float).reshape(3)

In the following story, I will talk about the ‘Golden Rules’ of bootstraps and show how I did my codings.

Thank you for reading! Happy coding!

--

--