Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derive_vars_transposed: extra variables are treated as by_vars #2325

Closed
kaz462 opened this issue Feb 5, 2024 · 11 comments · Fixed by #2443
Closed

derive_vars_transposed: extra variables are treated as by_vars #2325

kaz462 opened this issue Feb 5, 2024 · 11 comments · Fixed by #2443
Assignees
Labels
enhancement New feature or request

Comments

@kaz462
Copy link
Collaborator

kaz462 commented Feb 5, 2024

Background Information

In derive_vars_transposed, if dataset_merge contains extra variables that not specified in by_vars/key_var/value_var , they will be treated as grouping variables.
Example: test is a redundant variable that differs within USUBJID and FAREFID

cm <- tribble(
  ~USUBJID, ~CMGRPID, ~CMREFID, ~CMDECOD,
  "BP40257-1001", "14", "1192056", "PARACETAMOL"
)
facm <- tribble(
  ~USUBJID,     ~test, ~FAREFID, ~FATESTCD, ~FASTRESC,
  "BP40257-1001", "1", "1192056", "CMATC1CD", "N",
  "BP40257-1001", "3", "1192056", "CMATC2CD", "N02",
  "BP40257-1001", "3", "1192056", "CMATC3CD", "N02B",
  "BP40257-1001", "4", "1192056", "CMATC4CD", "N02BE"
)

cm %>%
  derive_vars_transposed(
    facm,
    by_vars = exprs(USUBJID, CMREFID = FAREFID),
    key_var = FATESTCD,
    value_var = FASTRESC
  ) 
# A tibble: 3 × 9
  USUBJID      CMGRPID CMREFID CMDECOD     test  CMATC1CD CMATC2CD CMATC3CD CMATC4CD
  <chr>        <chr>   <chr>   <chr>       <chr> <chr>    <chr>    <chr>    <chr>   
1 BP40257-1001 14      1192056 PARACETAMOL 1     N        NA       NA       NA      
2 BP40257-1001 14      1192056 PARACETAMOL 3     NA       N02      N02B     NA      
3 BP40257-1001 14      1192056 PARACETAMOL 4     NA       NA       NA       N02BE   

Expected results (need pre-processing to remove the unused variables):

cm %>%
  derive_vars_transposed(
    facm %>% select(-test),
    by_vars = exprs(USUBJID, CMREFID = FAREFID),
    key_var = FATESTCD,
    value_var = FASTRESC
  )
# A tibble: 1 × 8
  USUBJID      CMGRPID CMREFID CMDECOD     CMATC1CD CMATC2CD CMATC3CD CMATC4CD
  <chr>        <chr>   <chr>   <chr>       <chr>    <chr>    <chr>    <chr>   
1 BP40257-1001 14      1192056 PARACETAMOL N        N02      N02B     N02BE  

Definition of Done

add id_cols to derive_vars_transposed

@kaz462 kaz462 added bug Something isn't working good first issue Good for newcomers labels Feb 6, 2024
@gg106046 gg106046 self-assigned this Feb 8, 2024
@gg106046
Copy link
Collaborator

gg106046 commented Apr 9, 2024

Hi @kaz462! I came across the example for derive_vars_transposed() in which one of the grouping variables which is not mentioned in by_vars/key_var/value_var is required for transposing the dataset the right way:

cm <- tribble(
  ~USUBJID, ~CMGRPID, ~CMREFID, ~CMDECOD,
  "BP40257-1001", "14", "1192056", "PARACETAMOL",
  "BP40257-1001", "18", "2007001", "SOLUMEDROL",
  "BP40257-1002", "19", "2791596", "SPIRONOLACTONE"
)
facm <- tribble(
  ~USUBJID, ~FAGRPID, ~FAREFID, ~FATESTCD, ~FASTRESC,
  "BP40257-1001", "1", "1192056", "CMATC1CD", "N",
  "BP40257-1001", "1", "1192056", "CMATC2CD", "N02",
  "BP40257-1001", "1", "1192056", "CMATC3CD", "N02B",
  "BP40257-1001", "1", "1192056", "CMATC4CD", "N02BE",
  "BP40257-1001", "1", "2007001", "CMATC1CD", "D",
  "BP40257-1001", "1", "2007001", "CMATC2CD", "D10",
  "BP40257-1001", "1", "2007001", "CMATC3CD", "D10A",
  "BP40257-1001", "1", "2007001", "CMATC4CD", "D10AA",
  "BP40257-1001", "2", "2007001", "CMATC1CD", "D",
  "BP40257-1001", "2", "2007001", "CMATC2CD", "D07",
  "BP40257-1001", "2", "2007001", "CMATC3CD", "D07A",
  "BP40257-1001", "2", "2007001", "CMATC4CD", "D07AA",
  "BP40257-1001", "3", "2007001", "CMATC1CD", "H",
  "BP40257-1001", "3", "2007001", "CMATC2CD", "H02",
  "BP40257-1001", "3", "2007001", "CMATC3CD", "H02A",
  "BP40257-1001", "3", "2007001", "CMATC4CD", "H02AB",
  "BP40257-1002", "1", "2791596", "CMATC1CD", "C",
  "BP40257-1002", "1", "2791596", "CMATC2CD", "C03",
  "BP40257-1002", "1", "2791596", "CMATC3CD", "C03D",
  "BP40257-1002", "1", "2791596", "CMATC4CD", "C03DA"
)

cm %>%
  derive_vars_transposed(
    facm,
    by_vars = exprs(USUBJID, CMREFID = FAREFID),
    key_var = FATESTCD,
    value_var = FASTRESC
  ) %>%
  select(USUBJID, CMDECOD, starts_with("CMATC"))


# A tibble: 5 x 6
  USUBJID      CMDECOD        CMATC1CD CMATC2CD CMATC3CD CMATC4CD
  <chr>        <chr>          <chr>    <chr>    <chr>    <chr>   
1 BP40257-1001 PARACETAMOL    N        N02      N02B     N02BE   
2 BP40257-1001 SOLUMEDROL     D        D10      D10A     D10AA   
3 BP40257-1001 SOLUMEDROL     D        D07      D07A     D07AA   
4 BP40257-1001 SOLUMEDROL     H        H02      H02A     H02AB   
5 BP40257-1002 SPIRONOLACTONE C        C03      C03D     C03DA

Here, the variable FAGRPID is important for grouping and transposing it the right way. And adding just the value of by_vars in id_cols would cause an issue with duplicates:

# A tibble: 3 x 6
  USUBJID      CMDECOD        CMATC1CD  CMATC2CD  CMATC3CD  CMATC4CD 
  <chr>        <chr>          <list>    <list>    <list>    <list>   
1 BP40257-1001 PARACETAMOL    <chr [1]> <chr [1]> <chr [1]> <chr [1]>
2 BP40257-1001 SOLUMEDROL     <chr [3]> <chr [3]> <chr [3]> <chr [3]>
3 BP40257-1002 SPIRONOLACTONE <chr [1]> <chr [1]> <chr [1]> <chr [1]>
Warning message:
Values from `FASTRESC` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = {summary_fun}` to summarise duplicates.
* Use the following dplyr code to identify duplicates.
  {data} |>
  dplyr::summarise(n = dplyr::n(), .by = c(USUBJID, FAREFID, FATESTCD)) |>
  dplyr::filter(n > 1L) 

So do you think it's a good idea to add an additional argument (id_vars) to the function? That way, the user can decide whether to keep any additional grouping variables apart from the ones mentioned in by_vars, or not (like in the example you mentioned, where the test variable is not required.)

@kaz462 kaz462 added enhancement New feature or request and removed bug Something isn't working good first issue Good for newcomers labels Apr 10, 2024
@kaz462
Copy link
Collaborator Author

kaz462 commented Apr 10, 2024

Hi @gg106046 , thanks for your example! I like this idea of adding id_vars to derive_vars_transposed, @pharmaverse/admiral any objections?

@ProfessorP-beep
Copy link
Collaborator

Hey,

If I'm able I'd love to contribute to this issue.

@bms63
Copy link
Collaborator

bms63 commented Apr 12, 2024

Hi @ProfessorP-beep!

@gg106046 is working on this, but it would be great for you to help review the PR and try out the update.

Would you like to be added to the admiral community team - periodic tags for good issues to work on.

Also have you completed the #1839 ? Great way to get familiar with our processes

@ProfessorP-beep
Copy link
Collaborator

@bms63 Absolutely! and I haven't completed the dummy example yet. I can work on it today / tomorrow!

Thanks

@ProfessorP-beep
Copy link
Collaborator

Just finished the dummy example. Just need permissions to push to dummy branch
whenever you're able. Thanks!

@bms63
Copy link
Collaborator

bms63 commented Apr 12, 2024

@ProfessorP-beep added you to repo and to the community team

@kaz462
Copy link
Collaborator Author

kaz462 commented May 15, 2024

Hi @gg106046 , do you have bandwidth to implement id_vars before the June 3rd release? I can take over if it doesn't work for you

@gg106046
Copy link
Collaborator

Hi @gg106046 , do you have bandwidth to implement id_vars before the June 3rd release? I can take over if it doesn't work for you

Hi @kaz462 , I can do it before end of next week.

@manciniedoardo
Copy link
Collaborator

@gg106046 still on track for EOW?

@gg106046
Copy link
Collaborator

@gg106046 still on track for EOW?

Yes!

gg106046 pushed a commit that referenced this issue May 24, 2024
gg106046 pushed a commit that referenced this issue May 24, 2024
bms63 added a commit that referenced this issue May 29, 2024
…e_vars_atc()` (#2443)

* #2325 Added `id_vars` to `derive_vars_transposed()` and `derive_vars_atc()`

* #2325 Updated NEWS.md

* Update NEWS.md

Co-authored-by: Ben Straub <ben.x.straub@gsk.com>

* #2325 Updated the numbering convention in the test file

* Update R/derive_vars_transposed.R

Co-authored-by: Kangjie Zhang <47867131+kaz462@users.noreply.github.com>

---------

Co-authored-by: Jerry Johnson <jerry@gmail.com>
Co-authored-by: Ben Straub <ben.x.straub@gsk.com>
Co-authored-by: Kangjie Zhang <47867131+kaz462@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

5 participants