# Resubmitted my endogeneity paper to Econometrica.

29/05/14 12:46 AM

Find the new version here. The intuitive idea is simple. Here is the structural equation:

\[Y=g(X,Z)+U,\]

and we want to know whether X is endogenous or not. We don’t care about Z, which is simply a vector of controls. We need the following assumption:

A1) g is continuous in X

Now, if X is exogenous, then E(U|X,Z)=E(U|Z), so

\[E(Y|X,Z)=g(X,Z)+E[U|Z]\]

will be continuous in X.

This means that if A1 holds, if E[Y|X,Z] is discontinuous, then X must be endogenous! We can test the exogeneity of X by looking for a discontinuity in E(Y|X,Z).

Ok, but does this test have power? The answer is yes, in certain cases. When X is endogenous, E(U|X,Z) varies in X. For this test to have power, we need more, we need that when X is endogenous,

A2) E(U|X,Z) is discontinuous in X.

This is not a general phenomenon, but it does happen in some cases, especially if the variable X has bunching points. Check the paper to see plenty of examples.

So, the test could be build in the following way: estimate the quantity

\[\Delta(Z)=E(Y|X=0,Z)-\lim_{x\rightarrow 0}E(Y|X=x,Z).\]

and test whether it is equal to zero. This is a sound strategy in a linear model, for example, when the quantities above can be estimated with simple regressions. However, if we want to do nonparametric regressions two problems arise. The first is practical. The second term is a boundary quantity, which should be estimated with a local linear regression. Unfortunately those don’t run if Z has even as much as two dimensions. I’ll add a post about that sometime. The second problem is the curse of dimensionality: the variance of the estimation of any of the terms above can be huge. The solution is to aggregate the discontinuities, and I choose to do it the following way:

\[\theta=\lim_{x\rightarrow 0} \int \left[ E(Y|X=0,Z)-E(Y|X,Z)\right]dF(Z|X=x)\]

which is the same as

\[\theta=\lim_{x\rightarrow 0} E(E(Y|X=0,Z)-Y|X=z)\]

because of the law of iterated expectations. This eliminates the curse of dimensionality, and the need to estimate the second term. To test the exogeneity of X, estimate theta and test whether it is equal to zero.

and we want to know whether X is endogenous or not. We don’t care about Z, which is simply a vector of controls. We need the following assumption:

A1) g is continuous in X

Now, if X is exogenous, then E(U|X,Z)=E(U|Z), so

will be continuous in X.

This means that if A1 holds, if E[Y|X,Z] is discontinuous, then X must be endogenous! We can test the exogeneity of X by looking for a discontinuity in E(Y|X,Z).

Ok, but does this test have power? The answer is yes, in certain cases. When X is endogenous, E(U|X,Z) varies in X. For this test to have power, we need more, we need that when X is endogenous,

A2) E(U|X,Z) is discontinuous in X.

This is not a general phenomenon, but it does happen in some cases, especially if the variable X has bunching points. Check the paper to see plenty of examples.

So, the test could be build in the following way: estimate the quantity

and test whether it is equal to zero. This is a sound strategy in a linear model, for example, when the quantities above can be estimated with simple regressions. However, if we want to do nonparametric regressions two problems arise. The first is practical. The second term is a boundary quantity, which should be estimated with a local linear regression. Unfortunately those don’t run if Z has even as much as two dimensions. I’ll add a post about that sometime. The second problem is the curse of dimensionality: the variance of the estimation of any of the terms above can be huge. The solution is to aggregate the discontinuities, and I choose to do it the following way:

which is the same as

because of the law of iterated expectations. This eliminates the curse of dimensionality, and the need to estimate the second term. To test the exogeneity of X, estimate theta and test whether it is equal to zero.

blog comments powered by Disqus