Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between graph and tau estimate #3

Open
samsunknight opened this issue Mar 7, 2023 · 1 comment
Open

Discrepancy between graph and tau estimate #3

samsunknight opened this issue Mar 7, 2023 · 1 comment

Comments

@samsunknight
Copy link

samsunknight commented Mar 7, 2023

Hi, thanks so much for this package! I'm really grateful to you for coding this up and building it. For one of my applications, however, I'm finding some apparent discrepancies between the graphed quantities and the reported $\tau$. For example, see the below screenshot:

Screen Shot 2023-03-07 at 6 04 53 PM

This $\tau$ seems obviously much larger than the visible difference between the graphed lines, which inverts from roughly 0.1 above control to 0.2 below control, suggesting a $\tau$ closer to -0.3.

Can you help me understand what might be causing this discrepancy? I was trying to debug by inspecting the fixed effects, but I wasn't able to see in the code where the fixed effects are stored, so had trouble understanding what might have gone amiss; I'm not sure if the plot is wrong, if $\tau$ is wrong, or if there's just something odd about the relatively sparse $\lambda$ weighting that is leading to the mismatch between the estimate and the visual pattern. I'd be happy to provide replication data if that would be helpful, thanks regardless for your attention to this.

@samsunknight
Copy link
Author

samsunknight commented Mar 8, 2023

Would it be correct to say that the current code takes a weighted difference between pre- and post-periods, but without estimating a full two-way fixed effects regression? From what I can tell, the estimation of the weights $\omega$, $\lambda$ and $\zeta$ look great, but I believe that the computation of $\tau$ as the difference between the weighted averages does not implement the two-way fixed effects (diff-in-diff) step as described in the paper.

I've adjusted my local version of the module with the following functions for estimating $\tau$:

  def regression_df(self):
    
    date_var = self.df.index.name
    melted_df = self.df.reset_index().melt(id_vars=date_var)
    id_var = melted_df.columns[1]
    melted_df['treatment'] = melted_df[id_var].isin(self.treatment).values * melted_df[date_var].isin(self.Y_post_t.index).values
    
    # sdid
    omega_weights, lambda_weights = self.estimated_params()
    melted_df = melted_df.merge(omega_weights.iloc[:-1].rename(columns={'features':id_var,
                                                                      'sdid_weight':'omega_weight'}),on=id_var,how='outer')
    melted_df = melted_df.merge(lambda_weights.rename(columns={'time':date_var,
                                                                      'sdid_weight':'lambda_weight'}),on=date_var,how='outer')
    
    melted_df['sdid_weight'] = melted_df['omega_weight'].fillna(1/len(self.treatment)) * melted_df['lambda_weight'].fillna(1/len(self.Y_post_t))
    
    # sc
    omega_weights_ADH = self.estimated_params('sc')
    melted_df = melted_df.merge(omega_weights_ADH.rename(columns={'features':id_var}),on=id_var,how='outer')
    melted_df['sc_weight'] = melted_df['sc_weight'].fillna(1/len(self.treatment))
    
    melted_df = melted_df.set_index([id_var,date_var])
    
    return melted_df

...

  def hat_tau(self, model="sdid"):
  
    """
    # adjusted from github to perform weighted TWFE regression 
    """
    
    regression_df = self.regression_df()
    

    if model == "sdid":
          
        regression_df_noZeroWeight = regression_df.loc[regression_df['sdid_weight']>0] # weights must be strictly positive for PanelOLS; as they're zero weight, OK to drop
        
        FE = PanelOLS(regression_df_noZeroWeight['value'], regression_df_noZeroWeight['treatment'],
        		  entity_effects = True, 
        		  time_effects = True,
                  weights = regression_df_noZeroWeight['sdid_weight'])
              
        result = FE.fit()
        		  
        tau_est = result.params[0]

    elif model == "sc":
      
        regression_df_noZeroWeight = regression_df.loc[regression_df['sc_weight']>0] # weights must be strictly positive for PanelOLS; as they're zero weight, OK to drop
        
        FE = PanelOLS(regression_df_noZeroWeight['value'], regression_df_noZeroWeight['treatment'],
        		  time_effects = True,
                  weights = regression_df_noZeroWeight['sc_weight'])
              
        result = FE.fit()
        
        tau_est = result.params[0]

I've also made a few smaller adjustments to plot this object differently, and to refactor the variance code to use these functions directly by making a deepcopy() of the given class instance and then reestimating the attribute objects, but the main changes were the above adjustments. Would be interested to hear if you think this is correct. Thanks again for building this code!

Here's the new version of the above graph with this new estimation method, which looks closer to expected:

Screen Shot 2023-03-07 at 9 44 24 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant