Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined table output #144

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open

Combined table output #144

wants to merge 30 commits into from

Conversation

ehwenk
Copy link
Collaborator

@ehwenk ehwenk commented Dec 3, 2023

A function to generate a combined table output - trait data are still in long format, with all information from relational tables merged as additional, condensed columns.

Requires review and plan for how this will be integrated into workflow. For now I'm assuming it will be a function that can be sourced and run to generate outputs for releases.

@yangsophieee @dfalster It would be great if both of you could build the combined table and look at syntax, packing "rules".

ehwenk and others added 24 commits November 10, 2023 13:47
rename context-related columns to `treatment_context_properties`, not `treatment_contexts` to align with how information is coded into the ontology
Realised we need to maintain identifiers for output to still match our OBOE-adherent data model. They can vanish for individual join functions, but not for one of the official traits.build outputs
- add in contributors
- ensure the symbols we are using for collapsing data aren't in the data itself
@ehwenk ehwenk requested review from dfalster and yangsophieee and removed request for dfalster December 3, 2023 22:06
Now I see where `austraits` is being used - presumably this means the function `join_contexts` also needs to be moved to `traits.build`? @dfalster
@ehwenk
Copy link
Collaborator Author

ehwenk commented Dec 3, 2023

This is failing because I've using an austraits package function (join_contexts()) in the function. Still please check the function, but it can't get merged in until we move this function to traits.build.

Copy link
Collaborator

@yangsophieee yangsophieee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still having a look at the final output, but submitting some comments for now.

#'
#' @param database A traits.build database
#'
#' @return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs documentation

dplyr::mutate(location_properties = paste0(location_property, "=", value)) %>%
dplyr::select(dplyr::all_of(c("dataset_id", "location_id", "location_name", "location_properties"))) %>%
dplyr::group_by(dataset_id, location_id, location_name) %>%
dplyr::mutate(location_properties = paste0(location_properties, collapse = "; ")) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would dplyr::summarise here work the same and replace the need for distinct() later? Same with later tables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly but, I think distinct() is actually clearer in this case - otherwise you'd just have to list all the column individually and decide how to treat each one (lots of "take first value")

@@ -1176,6 +1176,20 @@ dataset_test_worker <-
info = paste0(red(dataset_id), "\t`db_traits_pivot_longer` threw a warning")
)
}

expect_no_error(
combined_table <- database_create_combined_table(dataset),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests creating a combined table for a single dataset, which may very well be intended.

Will we also add database_create_combined_table to build.R or to the GitHub actions list for AusTraits?

database$traits %>%
dplyr::left_join(location_latlon, by = c("dataset_id", "location_id")) %>%
dplyr::left_join(location_properties, by = c("dataset_id", "location_id", "location_name")) %>%
austraits::join_contexts(contexts_tmp) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you define a function called join_contexts above? Why do we use the austraits version here?

data_collectors = ifelse(
is.na(ORCID),
paste0(data_collectors, " <affiliation:", affiliation),
paste0(data_collectors, ";affiliation:", affiliation)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add spaces after the semicolons and use equal (=) signs instead of colons, consistent with location_properties? It might be more readable.

R/output_combined_table.R Outdated Show resolved Hide resolved
description = stringr::str_replace_all(description, "=", "-"),
value = ifelse(
is.na(description),
paste0(context_property, ":", value),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here I wonder if "=" would be more readable than ":"

out <- contexts_table %>%
dplyr::filter(link_id == context_id) %>%
dplyr::select(-link_id) %>%
dplyr::distinct(dataset_id, link_vals, .keep_all = TRUE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line necessary? Shouldn't it be distinct anyway, otherwise there's something wrong with the contexts table that we want to be picked up by dataset_test?

)
}

combined_table <-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the ID columns that help join to other relational tables could be removed from the combined table?

context_property = stringr::str_replace_all(context_property, "=", "-"),
value = stringr::str_replace_all(value, "=", "-"),
description = stringr::str_replace_all(description, "=", "-"),
context_property = stringr::str_replace_all(context_property, ";", ","),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the characters "<" and ">" also need to be replaced in case they ever make it into the context_property, value and description fields?

@yangsophieee
Copy link
Collaborator

I've had a look at the final table. I think it looks good overall!

@yangsophieee
Copy link
Collaborator

@ehwenk Also need to update documentation (export the new function) with devtools::document or devtools::check

@dfalster
Copy link
Member

dfalster commented Dec 8, 2023

I will review the table next week. In the mean time, I agree we should move a copy of Join_contexts into traits.build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants