Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reexport non-deprecated dplyr functions #163

Merged
merged 14 commits into from
May 20, 2024
Merged

feat: Reexport non-deprecated dplyr functions #163

merged 14 commits into from
May 20, 2024

Conversation

krlmlr
Copy link
Collaborator

@krlmlr krlmlr commented May 7, 2024

Closes #144.

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 7, 2024

@DavisVaughan: Can you please take a look? Anything else we want to reexport, any deprecated functions I've included by accident?

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 7, 2024

Please review only the first comment, I'll need to add reexport statements once we have agreed on the set of reexports.

@krlmlr krlmlr marked this pull request as draft May 8, 2024 02:45
Copy link
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my list of what I think we can re-export reasonably (and I do think we should target the exact package they came from, not dplyr in all cases). The idea is to try and support most interactive usage of dplyr, and to remove anything superseded, deprecated, defunct, etc, which could make it harder on dplyr in the future if we exported them here. I think for programmatic tools, users can pull from rlang or dplyr directly as needed.

This list is just based on the "new" exports that are in this PR, but you could do a once over of all the other existing exports and try and apply these principles too if you wanted to.

From rlang:

  • .data

From tibble:

  • add_row
  • as_tibble
  • tribble

From dplyr:

  • between
  • bind_cols
  • bind_rows
  • case_match
  • coalesce
  • consecutive_id
  • cumall
  • cumany
  • cume_dist
  • cummean
  • cur_column
  • cur_group
  • cur_group_id
  • cur_group_rows
  • dense_rank
  • group_cols
  • group_indices
  • group_nest
  • group_split
  • group_trim
  • groups
  • if_else
  • min_rank
  • n_groups
  • na_if
  • near
  • ntile
  • order_by
  • percent_rank
  • pick
  • top_n
  • with_order

From tidyselect:

  • all_of
  • any_of
  • contains
  • everything
  • last_col
  • matches
  • num_range
  • one_of
  • starts_with
  • where

Remove (mostly superseded or deprecated):

  • add_rownames
  • all_vars
  • any_vars
  • arrange_all
  • arrange_at
  • arrange_if
  • as_data_frame
  • as_label (we'd like to remove from dplyr, comes from rlang)
  • check_dbplyr (should not be relevant)
  • combine
  • common_by (programmatic tool, pull from dplyr as needed)
  • copy_to (should not be relevant)
  • cur_data
  • cur_data_all
  • current_vars
  • distinct_all
  • distinct_at
  • distinct_if
  • distinct_prepare (programmatic tool)
  • ensym (we'd like to remove from dplyr, comes from rlang)
  • ensyms (we'd like to remove from dplyr, comes from rlang)
  • expr (we'd like to remove from dplyr, comes from rlang)
  • enquo (we'd like to remove from dplyr, comes from rlang)
  • enquos (we'd like to remove from dplyr, comes from rlang)
  • filter_all
  • filter_at
  • filter_if
  • group_by_all
  • group_by_at
  • group_by_if
  • group_by_drop_default (programmatic tool)
  • group_by_prepare (programmatic tool)
  • grouped_df (programmatic tool)
  • ident (sql)
  • is.tbl (sql)
  • last_dplyr_warnings (probably not relevant? not sure, but kind of experimental)
  • lst (trying to remove from dplyr)
  • make_tbl (trying to remove from dplyr)
  • mutate_at
  • mutate_if
  • mutate_all (this is currently exported but id remove it)
  • new_grouped_df (programmatic tool)
  • new_rowwise_df (programmatic tool)
  • progress_estimated
  • quo
  • quo_name
  • quos
  • recode
  • recode_factor
  • show_query
  • everything sql_*
  • everything src_*
  • summaris/ze_if
  • summaris/ze_all
  • summaris/ze_all
  • sym
  • syms
  • tbl
  • tbl_df
  • tbl_nongroup_vars
  • tbl_ptype
  • top_frac
  • transmute_all
  • transmute_at
  • transmute_if
  • type_sum
  • validate_grouped_df
  • validate_rowwise_df
  • wrap_dbplyr_obj

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 16, 2024

Thanks, Davis, for the extensive review.

I went through the existing reexports and identified the following obvious candidates for removal:

  • auto_copy()
  • do()
  • dplyr_col_modify()
  • dplyr_reconstruct()
  • dplyr_row_slice()
  • funs()
  • same_src()
  • vars()

The following are experimental, not sure:

  • group_map()
  • group_modify()
  • group_walk()

In addition, there is a slew of functions related to grouping. At this stage, duckplyr does not support group_by() . Should it? Removal of the reexports might depend on if we think this should be implemented. Is there a way to do rowwise operations with the new .by argument? Should this be mentioned in ?dplyr::dplyr_by ?

  • group_by()
  • group_data()
  • group_indices()
  • group_keys()
  • group_rows()
  • group_size()
  • group_vars()
  • groups()
  • is_grouped_df()
  • n_groups()
  • rowwise()

@krlmlr krlmlr requested a review from DavisVaughan May 16, 2024 03:55
@krlmlr krlmlr marked this pull request as ready for review May 16, 2024 06:16
@DavisVaughan
Copy link
Contributor

DavisVaughan commented May 16, 2024

Removing those "obvious candidates" and anything related to groups_*() seems fine to me if duckplyr supports and prefers .by. Otherwise it is kind of confusing that group_by() "falls back" but .by doesn't.

For rowwise(), we don't yet support that through .by, but it would likely be a separate mutually exclusive argument like .by_row = FALSE/TRUE so we can keep .by "pure" tidyselect, see tidyverse/dplyr#6660

export(across)
export(add_count)
export(add_row)
export(add_tally)
export(all_equal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_equal deprecated

Comment on lines +15 to +17
#' @importFrom dplyr auto_copy
#' @importFrom dplyr do
#' @importFrom dplyr dplyr_col_modify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need all these here for testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for dplyr's tests converted to duckplyr.

@krlmlr krlmlr merged commit d819c43 into main May 20, 2024
9 checks passed
@krlmlr krlmlr deleted the f-144-reexport branch May 20, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider (re)exporting all {dplyr} functions (or depending on it)
2 participants