Recent Posts

Pages: 1 ... 8 9 [10]
Material / Re: Questions about paper questions
« Last post by Scott Onestak on April 10, 2017, 08:09:28 PM »

I would say: no, a) a) and f) are not the same thing, though they may be very similarly related.  Question a) a) is asking for a model while f) is asking for a test for the hypothesis given.

Similar to what Thomas said before, besides providing that distinction, I cannot really discuss much more about the questions without giving away the answers.
Exams and Grading / Paper question j
« Last post by lbolz on April 10, 2017, 06:49:30 PM »
Would it be reasonable to say that a customer characteristic is related to the treatment variable because customers with this characteristic will be affected by the worker's weight? Or would every variable be superfluous because customer characteristics wouldn't cause workers to be obese?

Thank you!!
Material / Re: Various Variances of Beta1-hat
« Last post by Youngmin Kim on April 10, 2017, 06:41:35 PM »
1. Let's start from var(beta1_hat) = E[(beta1_hat - E[beta1_hat])^2]: as you notice, this corresponds to a variance of the estimator, beta1_hat (by the "definition" of the variance). No assumptions required so far.

2. Adding assumption 1 and 2 allows us to reach a crucial property of beta1_hat, an unbiasedness: E[beta1_hat] = beta1. Furthermore, if we have assumption 3, the OLS estimator turns out to be the "best" (a.k.a, efficiency, or also referred to as "BLUE": best-linear unbiased estimator). This is guranteed by Gauss-Markov theorem.

Now, armed with assumption 1,2,3, variance of beta1_hat (which we start from the general form, var(beta1_hat) = E[(beta1_hat - E[beta1_hat])^2] ) can be written more specifically as follows:

var(beta1_hat) = (sigma^2/n)/(var(X1)*(1-R1^2))

where sigma^2 is "estimated" from the following: 1/(n-k-1) * (Σ u_i^2)   (be aware of the fact that (1) u_i is "residual" of the regression after the regression has been run, (2) k is # of regressors, or # of X variables in the model)

Unlike the previous general formula, we definitely have takeaways from this specific form: variance of the estimates beta1_hat depends on (A) variance of unobservables (under the assumption 3: homoskedasticity), (B) # of samples in hand, (C) variance of X1, and (D) degree of colinearity of X1 with respect to other observables (other X's) we explicity specify in the regression model.

Finally, what if we only have assumption 1 and 2 (i.e., breakdown of 3: heteroskedasticity)? It turns out that the same generic form of the variance of beta1_hat is written as:

var(beta1_hat) = ( 1/(n-k-1) * ( Σ[r1_i^2*u_i^2] / n) ) / ( 1/n * Σ[r1_i^2] )^2

As you clearly see, this explicity takes into account the fact that sigma_i differs across individuals i (breakdown of assumption 3): roughly speaking, when estimating the variance in this case, we not only use u_i (residual), but also use additional individual-level information that comes from r1_i to back out the heteroskedasticity-adjusted standard errors of the estimator (and eventually, estimates) of beta1_hat.

Lastly, what happens to "heteroskedasticity-adjusted" var(beta1_hat) if we put back assumption 3? (sigma_i = sigma holds "truely" for all individuals)

Well, it requires some algebra, but let me provide an intuition (guess this is enough):

(1) demoninator: 1/n * Σ[r1_i^2] = var(X1)*(1-R1^2) as r1 is residual after regressing X1 on X2, ... , Xk (pure variation of X1, after partialling out potential correlation of X1 with respect to other controls). Apparently, mean of r1_i is zero, so it maps well to the generic variance formula that has been mentioned above.

(2) numerator: if we truly have homoskedasticity, the numerator would converge to (don't need to know what this term means, but just bear with me for the intuition), ( 1/(n-k-1) * Σ[u_i^2] ) * ( 1/n * Σ[r1_i^2]).

(3) But notice that 1/n * Σ[r1_i^2] has been multiplied twice in the denominator.

(4) Finally, we have: var(beta1_hat) = ( 1/(n-k-1) * (Σ u_i^2) / n )/(var(X1)*(1-R1^2))

Now, compare with the formula of var(beta1_hat) under the assumption 1,2,3: same, but only under the circumstance that assumption 3 is true in our data.

What if we use the formula for var(beta1_hat) under homoskedasticity despite the fact that assumption 3 no longer holds any more? The "estimated" variance for beta1_hat is incorrect, which leads to problems for hypothesis testing, conficence intervals, and etc.

I strongly encourage you to read lecture note 15 for related details with heteroskedasticity.

Hope this helps.

Material / Paper question part (i)
« Last post by bkane2 on April 10, 2017, 03:49:11 PM »

I'm a little confused about part (i) of the paper question. The first part talks about using "Self-esteem" as a proxy for omitted variables. The second part asks if this can account for "selection bias" in the study. If I remember correctly selection bias refers to selecting a sample from the population that isn't truly random. Is this the correct interpretation in this case? If so, it seems like this and omitted variable bias are unrelated and a proxy variable won't do anything to fix it. Or am I misunderstanding the meaning of "selection bias" in this context?

Administrative / 2/11 Office hours
« Last post by Laura Ackerman on April 10, 2017, 03:45:29 PM »
Hi everyone,

Thomas will be holding my office hours tomorrow instead. They will be from 10:30-12:30 in Lattimore 210.

Material / Re: 2 questions about paper question
« Last post by Shenxiong on April 10, 2017, 12:25:38 PM »
Hi Alexis,

  Thank you for your reply!!!
  The second part really makes sense.
  For the first part, according to the note 10, it is clear that if I include Employment(which is Age-18), it will become a superfluous variable. And the reason is that it can be predicted by other variable (since I have included Age in the model). Thus, it will not affect  beta1, but it may affect other slope coefficient... it also increase the variance of slope coefficient, since  we include more variables.
   Is my argument right?
Material / Re: Questions about paper questions
« Last post by Shenxiong on April 10, 2017, 12:21:56 PM »
Hi Alexis,

    Thank you for your reply.
    If I can assume that both sectors are grouped in only one, do I need to include both of them in my regression model? or the only one of them?
    Moreover, does a) a) and f) imply the same thing?
Administrative / Extra Office Hours (Tuesday 4/11)
« Last post by Thomas VanDer Straaten on April 10, 2017, 11:37:50 AM »
All, in addition to my regular office hours, I will be holding an additional office hour in anticipation of Wednesday's midterm. They will be from 9.30-10.30 in Ruth Merrill.
Exams and Grading / Re: Material question a
« Last post by Thomas VanDer Straaten on April 10, 2017, 11:28:48 AM »
Hi, you can assume that both sectors are grouped in only one. 
Material / Re: Questions about paper questions
« Last post by Alexis Orellana on April 10, 2017, 01:56:54 AM »
Hi Shenxiong,

1. Thomas is right. You can take from Table 3 those variables you consider important/relevant and include them.

2. You can assume that both sectors are grouped in only one. In other words, the hypothesis we want to test is the existence of a wage penalty affecting women working either on the sales or on the services sector.

Pages: 1 ... 8 9 [10]