Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Use dplyr::left_join(relationship) argument #2247

Closed
ddsjoberg opened this issue Nov 21, 2023 · 15 comments · Fixed by #2433
Closed

Feature Request: Use dplyr::left_join(relationship) argument #2247

ddsjoberg opened this issue Nov 21, 2023 · 15 comments · Fixed by #2433
Assignees
Labels
enhancement New feature or request programming

Comments

@ddsjoberg
Copy link
Collaborator

Feature Idea

Our merging functions are (very) fancy joins with many useful options. In dplyr 1.1.1, the dplyr::left_join(relationship) argument that allows users to specify the expected type of merge that will be performed.

image

For example, if you specify relationship = "one-to-one" the function will error if there is more than one match on the by variables.

I think it can be useful to our users to add this argument to our functions.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

@ddsjoberg ddsjoberg added enhancement New feature or request programming labels Nov 21, 2023
@bundfussr
Copy link
Collaborator

Our functions expect "many-to-one" but do not use the relationship argument. Our own check function is used to ensure "many-to-one". It provides a more helpful error message than left_join().

@ddsjoberg
Copy link
Collaborator Author

Our functions expect "many-to-one" but do not use the relationship argument. Our own check function is used to ensure "many-to-one". It provides a more helpful error message than left_join().

But not every join users do will be many to one. This would allow them to specify, for example, one-to-one when that is the situation they are in.

@bundfussr
Copy link
Collaborator

Our functions expect "many-to-one" but do not use the relationship argument. Our own check function is used to ensure "many-to-one". It provides a more helpful error message than left_join().

But not every join users do will be many to one. This would allow them to specify, for example, one-to-one when that is the situation they are in.

Yes, we could add an expect_one_to_one argument and call assert_one_to_one() if it is true. The assertion also provides the records which are violating the condition.
What should be the default?

@ddsjoberg
Copy link
Collaborator Author

I think the default should be NULL allowing users to specify it or not (like dplyr::left_join() has done)

@ddsjoberg
Copy link
Collaborator Author

The argument should be relationship = "many-to-one" or relationship = "one-to-one". This makes it very clear what is being done for any user that is already familiar with dplyr.

@manciniedoardo manciniedoardo changed the title Feature Request: Use dplyr::left_join(relationship) argumnet Feature Request: Use dplyr::left_join(relationship) argument Nov 23, 2023
@manciniedoardo manciniedoardo added the future Issue to be implemented after release label Nov 23, 2023
@bms63 bms63 removed the future Issue to be implemented after release label Mar 14, 2024
@sophie-gem
Copy link
Collaborator

sophie-gem commented Mar 26, 2024

Functions which use dplyr::*_join():

  • create_single_dose_dataset.R; row 578; create_single_dose_dataset()
  • derived_join.R; row 811; get_joined_data()
  • derive_locf_records.R; row 189; derive_locf_records()
  • derive_merged.R; row 423; derive_vars_merged()
  • derive_param_computed.R; row 320; derive_param_computed()
  • derive_param_computed.R; row 529; get_hori_data()
  • derive_param_tte.R; row 433; derive_param_tte() - NOT NEEDED
  • derive_param_wbc_abs.R; row 142; derive_param_wbc_abs() - NOT NEEDED
  • derive_vars_query.R; row 192; derive_vars_query() - NOT NEEDED
  • derive_vars_transposed.R; row 99; derive_vars_transposed()

I believe these are all the functions that use one of dplyr::left_join() or dplyr::inner_join() or dplyr::full_join(). These are the ones I will look at updating the function call for. If there are any others, or if there are any that I don't need to do - please shout. :)

@bms63
Copy link
Collaborator

bms63 commented Mar 26, 2024

Rock on @sophie-gem !!

image

@sophie-gem
Copy link
Collaborator

Hi @bms63, @ddsjoberg - if you get the chance, would you be able to review this change I've made in the above commit to the first function in the list? (Thinking if I ensure this is ok before I continue it will save some review time at the end...). Also, two questions:

  1. Do we need to add an example to each function showing this in use?
  2. Should a test be provided for this new argument?

@bundfussr
Copy link
Collaborator

Hi @bms63, @ddsjoberg - if you get the chance, would you be able to review this change I've made in the above commit to the first function in the list? (Thinking if I ensure this is ok before I continue it will save some review time at the end...). Also, two questions:

  1. Do we need to add an example to each function showing this in use?
  2. Should a test be provided for this new argument?

I think we don't need the relationship argument for create_single_dose_dataset() because the only valid value is "many-to-one". In the left_join() call we could set relationship = "many-to-one". However, this would result in an error message which is hard to understand for users. Thus either we would need to catch the error and provide a more user-friendly error message or we should check before if there are duplicates in lookup_table.

At the moment duplicates produce incorrect results. Consider for example:

> custom_lookup <- tribble(
+     ~Value,   ~DOSE_COUNT, ~DOSE_WINDOW, ~CONVERSION_FACTOR,
+     "Q30MIN", (1 / 30),    "MINUTE",                      1,
+     "Q30MIN", (1 / 30),    "MINUTE",                      1,
+     "Q90MIN", (1 / 90),    "MINUTE",                      1
+ )
> 
> data <- tribble(
+     ~USUBJID, ~EXDOSFRQ, ~ASTDT, ~ASTDTM, ~AENDT, ~AENDTM,
+     "P01", "Q30MIN", ymd("2021-01-01"), ymd_hms("2021-01-01T06:00:00"),
+     ymd("2021-01-01"), ymd_hms("2021-01-01T07:00:00"),
+     "P02", "Q90MIN", ymd("2021-01-01"), ymd_hms("2021-01-01T06:00:00"),
+     ymd("2021-01-01"), ymd_hms("2021-01-01T09:00:00")
+ )
> 
> create_single_dose_dataset(data,
+                            lookup_table = custom_lookup,
+                            lookup_column = Value,
+                            start_datetime = ASTDTM,
+                            end_datetime = AENDTM
+ )
# A tibble: 9 × 6
  USUBJID EXDOSFRQ ASTDT      ASTDTM              AENDT      AENDTM             
  <chr>   <chr>    <date>     <dttm>              <date>     <dttm>             
1 P01     ONCE     2021-01-01 2021-01-01 06:00:00 2021-01-01 2021-01-01 06:00:00
2 P01     ONCE     2021-01-01 2021-01-01 06:30:00 2021-01-01 2021-01-01 06:30:00
3 P01     ONCE     2021-01-01 2021-01-01 07:00:00 2021-01-01 2021-01-01 07:00:00
4 P01     ONCE     2021-01-01 2021-01-01 06:00:00 2021-01-01 2021-01-01 06:00:00
5 P01     ONCE     2021-01-01 2021-01-01 06:30:00 2021-01-01 2021-01-01 06:30:00
6 P01     ONCE     2021-01-01 2021-01-01 07:00:00 2021-01-01 2021-01-01 07:00:00
7 P02     ONCE     2021-01-01 2021-01-01 06:00:00 2021-01-01 2021-01-01 06:00:00
8 P02     ONCE     2021-01-01 2021-01-01 07:30:00 2021-01-01 2021-01-01 07:30:00
9 P02     ONCE     2021-01-01 2021-01-01 09:00:00 2021-01-01 2021-01-01 09:00:00

sophie-gem added a commit that referenced this issue May 8, 2024
@sophie-gem
Copy link
Collaborator

Note to self: look at how to 'catch' error for create_single_dose_dataset()

@bms63
Copy link
Collaborator

bms63 commented May 16, 2024

Hi @sophie-gem - we have a release on June 3rd. Do you think you can complete updates before then?

@sophie-gem
Copy link
Collaborator

I am planning on it! Will aim to have it complete by end of next week - is that ok?

@bms63
Copy link
Collaborator

bms63 commented May 16, 2024

yes!

image

sophie-gem added a commit that referenced this issue May 19, 2024
sophie-gem added a commit that referenced this issue May 19, 2024
…uplicates in the `lookup_table` and create associated test.
@sophie-gem
Copy link
Collaborator

sophie-gem commented May 19, 2024

Hi all,

In the function documentation for get_hori_data() should analysis_value instead say set_values_to? Otherwise I'm not quite sure what analysis_value is referring to...

image

@sophie-gem sophie-gem linked a pull request May 19, 2024 that will close this issue
15 tasks
sophie-gem added a commit that referenced this issue May 20, 2024
@bundfussr
Copy link
Collaborator

Hi all,

In the function documentation for get_hori_data() should analysis_value instead say set_values_to? Otherwise I'm not quite sure what analysis_value is referring to...

image

Yes, that's a left-over from replacing analysis_value with set_values_to.

sophie-gem added a commit that referenced this issue May 26, 2024
…to only allow one-to-one and many-to-one relationship values according to feedback.
sophie-gem added a commit that referenced this issue May 26, 2024
Merge branch 'main' into 2247_left_join_relationship

# Conflicts:
#	NEWS.md
sophie-gem added a commit that referenced this issue May 27, 2024
…hip error in `derive_merged.R`, `derive_vars_transposed.R`.
sophie-gem added a commit that referenced this issue May 31, 2024
sophie-gem added a commit that referenced this issue May 31, 2024
bms63 added a commit that referenced this issue Jun 3, 2024
bms63 added a commit that referenced this issue Jun 3, 2024
bms63 added a commit that referenced this issue Jun 3, 2024
bms63 added a commit that referenced this issue Jun 3, 2024
* #2247 - added `relationship` argument to `create_single_dose_dataset()`.

* #2247 - Update `derive_vars_merged()` function with relationship argument.

* #2247 - Update `create_single_dose_dataset()` according to feedback. Set `relationship = many-to-one`.

* #2247 - add in assertion to check new `relationship` argument contains only the options allowed.

* #2247 - Update `create_single_dose_dataset()` to error if there are duplicates in the `lookup_table` and create associated test.

* #2247 - Update `derive_vars_transposed()` with relationship argument.

* #2247 - Update NEWS.md, running `styler::style_pkg()`, `lintr::lint_package()`

* #2247 - Update documentation to ensure URLs are inserted as links as opposed to plain text.

* #2247 - Update to use `signal_duplicate_records()` as recommended by feedback.

* #2247 - Revert unintended changes to `test-derive_var_atoxgr.R`

* #2247 - Update `derive_vars_merged()` and `derive_vars_transposed()` to only allow one-to-one and many-to-one relationship values according to feedback.

* #2247 - Update according to review comment. Catch the dplyr relationship error in `derive_merged.R`, `derive_vars_transposed.R`.

* #2247 - Update documentation for `get_hori_data()` due to misaligned text.

* #2247 - Running final devtools checks.

* Update R/derive_merged.R

Apply requested change.

Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>

* #2247 - Update functions to take dplyr error and replace argument names with parent function argument names.

* #2274 - Forgot to remove some dummy testing code! Have now removed.

* #2247 - Re-run devtools checks.

* docs: #2247 clarify arguments; tests: snapshot of my life

* chore: #2247 lintr

* tests: #2247 new snapshot

* chore: #2247 that lint life

---------

Co-authored-by: Ben Straub <ben.x.straub@gsk.com>
Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request programming
Development

Successfully merging a pull request may close this issue.

5 participants