"Unbalanced Panel" when groups are different sizes #71

alecmcclean · 2020-04-22T15:41:03Z

Hi Guys,

First - thanks a bunch for translating this package to R, I really appreciate it. I just wanted to flag a small issue I've found when using the bacon() function.

It seems that bacon() does not currently allow our groups to be different sizes. I've appended the code to generate a minimal example. In the dataset I create, we have 3 groups (id == 1, 2, 3), where id == 1 | 3 contain one individual, and id == 2 contains two individuals (ind_id is the individual id).

If I run bacon(id_var == "group_id", ...) the function will throw an error for an "Unbalanced Panel", because group 2 has twice as many time periods within it as group 1 (because there are two individuals in group 2).

But, I don't think you want to call that an error; otherwise, you cannot demonstrate 2x2 weighting heterogeneity arising from the size of the groups. And, from what I understand, this is one of the key takeaways of the Bacon decomposition: the larger groups retain higher weights in the 2x2.

Alternatively, if you do want to call that an unbalanced panel, I don't think you need the code calculating "n_k, n_u, n_ku", because n_k = n_u by definition and n_ku = 0.5.

Thanks again,
Alec

library(dplyr)

df <- 
  expand.grid(
    group_id = c(1, 2, 3), # Group ID (treatment level ID)
    t  = c(0, 1, 2)  # Time
  ) %>%
  mutate(
    # Treatment status
    a = case_when(
      group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
      group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
      T ~ 0 # id == 1 never treated
    )
  )

# Expand dataset with "individual" level observations 
df <- df %>% left_join(
  expand.grid(
    group_id = c(1, 2, 3), 
    ind_id = seq(1, 2)
    ) %>%
    filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
  ) %>%
  select(group_id, ind_id, everything()) %>%
  arrange(group_id, ind_id, t)

The text was updated successfully, but these errors were encountered:

EdJeeOnGitHub · 2020-04-27T11:22:33Z

Hi Alec,

Sorry for the delay in replying.

We'll get to the bottom of this - it looks like we went a bit over the top sanitising user inputs.

EdJeeOnGitHub · 2020-05-01T14:25:41Z

This should have been fixed in the latest PR #72 @evanjflack

library(dplyr)
set.seed(938)

df <- 
  expand.grid(
    group_id = c(1, 2, 3), # Group ID (treatment level ID)
    t  = c(0, 1, 2)  # Time
  ) %>%
  mutate(
    # Treatment status
    a = case_when(
      group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
      group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
      T ~ 0 # id == 1 never treated
    )
  )

# Expand dataset with "individual" level observations 
df <- df %>% left_join(
  expand.grid(
    group_id = c(1, 2, 3), 
    ind_id = seq(1, 2)
  ) %>%
    filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
) %>%
  select(group_id, ind_id, everything()) %>%
  arrange(group_id, ind_id, t) %>% 
  mutate(y = rnorm(nrow(.)))



bacon_res <- df %>% 
  bacon(formula = y ~ a,
        id_var = "group_id",
        time_var = "t")


bacon_res

with results:


1 Earlier vs Later Treated    0.2  0.30438
2 Later vs Earlier Treated    0.2  0.12943
3     Treated vs Untreated    0.6 -0.44938

  treated untreated   estimate weight                     type
2       1     99999 -0.8366176    0.4     Treated vs Untreated
3       2     99999  0.3250949    0.2     Treated vs Untreated
6       2         1  0.1294280    0.2 Later vs Earlier Treated
8       1         2  0.3043799    0.2 Earlier vs Later Treated

alecmcclean · 2020-05-03T16:23:37Z

Great, thank you!

hyeunjung · 2020-11-09T22:42:03Z

Thank you for this package! I tested using the example code above, but my codes don't go through. I made sure that I have the most updated version of bacondecomp package, but still get an error for an unbalanced error. Could you please check if this fix for an unbalanced panel is reflected in the updated version of bacondecomp package in R?

Thank you so much for your help!

EdJeeOnGitHub · 2020-11-09T23:45:23Z

Hi,

Did you use the latest version from GitHub or CRAN?

I believe this is fixed on GitHub but looking back at the logs I'm not sure if @evanjflack pushed the patch to CRAN.

If it's broken on GitHub too I'll have another look.

Thanks,
Ed

hyeunjung · 2020-11-10T00:28:24Z

Hi Ed, Yes, I used the latest version from CRAN. I think the updates were not pushed to CRAN. I just edited the source code and used it fine, but I think it would be great to push the update. Thanks, Elina

…

On Nov 9, 2020, at 3:45 PM, Ed Jee ***@***.***> wrote: Hi, Did you use the latest version from GitHub or CRAN? I believe this is fixed on GitHub but looking back at the logs I'm not sure if @evanjflack <https://github.com/evanjflack> pushed the patch to CRAN. If it's broken on GitHub too I'll have another look. Thanks, Ed — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARWHCY4Z43S4S72RMOQ5M6DSPB5KDANCNFSM4MOI37FA>.

PromiseKamanga · 2021-12-21T23:04:31Z

Following this thread, I got the impression that the error of "unbalanced panel" was already fixed. However, I just downloaded the package from GitHub today and I still got the same error when I tried to use it. The data I am using involves bilateral trade values of multiple countries. As such I have duplicate "country-year" combinations because I observe a country's trade with all its partners in a given year. Could that explain the error? Do you have a suggestion on how I should proceed?

kylebutts · 2021-12-22T02:09:34Z

Hi @PromiseKamanga, could you open a new issue and write the code you’re trying to run that fails. I’ll be happy to help

ridwandse · 2023-05-21T15:59:36Z

Hi @EdJeeOnGitHub can you generate the same simulated data set on STATA and post the codes here or share this data generated in R here, I just want to see whether STATA's ddtiming gives me the same diff-in-diff estimate with same DD comparisons and weights. Just curious to learn.
Thanks

EdJeeOnGitHub · 2023-05-21T16:04:35Z

Hi @ridwandse,

The code here will provide the exact same dataset since the seed has been set.

Something like write.csv(df, "my-df.csv") will save the file for loading into Stata

ridwandse · 2023-05-21T19:38:37Z

Thanks, @EdJeeOnGitHub, will follow up on the same.
Actually i have unbalanced data and STATA's bacondecomp Y D, ddetail does not work with unbalanced data, it requires data to be strongly balanced. However, another way of obtaining the bacondecomposition is to use ddtiminng i.e., ddtiming Y D, i(id) t(year). This works with unbalanced case. I am not sure whether to proceed with bacondecomp in balanced case or ddtiming with unbalanced data. If you have any leads on that. Please guide through.
Thanks

kylebutts · 2023-05-21T19:55:11Z

@ridwandse I think this is incorrect. Because something "runs" and spits out numbers does not mean it "works". The weights it reports are not correct. The bacon decomposition holds only in the strongly balanced case (it's an algebraic relationship between the TWFE OLS coefficient and a bunch of different averages).

In the unbalanced case, you can calculate the weights by hand (it's a bunch of n's basically) which is what ddtiming does. The weights do not mean anything though

ridwandse · 2023-05-22T08:05:08Z

Thank you @kylebutts , this was very useful. Yes you are right. I also have calculated all the DD comparison weights by hand as a combination of group size and treatment indicator (D) averages over i and t, we get the same results as using ddtiming
Thanks

alecmcclean closed this as completed May 3, 2020

EdJeeOnGitHub reopened this Nov 9, 2020

katjasem mentioned this issue Aug 10, 2023

bacondecomp / unbalanced panel #85

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Unbalanced Panel" when groups are different sizes #71

"Unbalanced Panel" when groups are different sizes #71

alecmcclean commented Apr 22, 2020

EdJeeOnGitHub commented Apr 27, 2020

EdJeeOnGitHub commented May 1, 2020 •

edited

alecmcclean commented May 3, 2020

hyeunjung commented Nov 9, 2020

EdJeeOnGitHub commented Nov 9, 2020

hyeunjung commented Nov 10, 2020 via email

PromiseKamanga commented Dec 21, 2021

kylebutts commented Dec 22, 2021

ridwandse commented May 21, 2023

EdJeeOnGitHub commented May 21, 2023

ridwandse commented May 21, 2023

kylebutts commented May 21, 2023

ridwandse commented May 22, 2023

"Unbalanced Panel" when groups are different sizes #71

"Unbalanced Panel" when groups are different sizes #71

Comments

alecmcclean commented Apr 22, 2020

EdJeeOnGitHub commented Apr 27, 2020

EdJeeOnGitHub commented May 1, 2020 • edited

alecmcclean commented May 3, 2020

hyeunjung commented Nov 9, 2020

EdJeeOnGitHub commented Nov 9, 2020

hyeunjung commented Nov 10, 2020 via email

PromiseKamanga commented Dec 21, 2021

kylebutts commented Dec 22, 2021

ridwandse commented May 21, 2023

EdJeeOnGitHub commented May 21, 2023

ridwandse commented May 21, 2023

kylebutts commented May 21, 2023

ridwandse commented May 22, 2023

EdJeeOnGitHub commented May 1, 2020 •

edited