Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase robustness of calculate_record_counts #161

Open
LukasWallrich opened this issue Jun 16, 2023 · 8 comments
Open

Increase robustness of calculate_record_counts #161

LukasWallrich opened this issue Jun 16, 2023 · 8 comments

Comments

@LukasWallrich
Copy link
Collaborator

Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label.

@LukasWallrich
Copy link
Collaborator Author

LukasWallrich commented Jun 16, 2023

Also, I don't think it (the record summary table) currently works correctly to compare labels or strings rather than sources - can that be? In my test, I get NAs in the last two columns.

@TNRiley
Copy link
Collaborator

TNRiley commented Jun 16, 2023

Also, I don't think it (the record summary table) currently works correctly to compare labels or strings rather than sources - can that be? In my test, I get NAs in the last two columns.

I'm trying to think of a use case outside of comparing the sources. I don't see any need for it to compare strings or labels.

Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label.

Edited it does rely on the label being "search" which could potentially be removed without other changes. I'll take a look at this in a bit more depth.

@TNRiley
Copy link
Collaborator

TNRiley commented Jun 21, 2023

Does this function need to rely on a search label? I would have expected it to work without any labels. If it needs the label, that should be documented, and a warning be issued when there is no search label.

Looked at this again and remembered why I had set this to filter by the search label. The table is used to show the impact of sources/methods. If we remove the need for the label, it will show any screened or final data as well, which I currently don't see a use case for and believe it would just clutter things up. I think there are many ways in which we could change this to ensure it is usable for other use cases, but I'm not able to envision those at this time.

I can add the following warning to the function and details to the documentation.

if (!"search" %in% n_unique$cite_label) {
warning("Source's must be tagged as 'search' in the 'cite_label' field.")

#' @details
#' The function works on three main inputs: unique_citations, citations, and n_unique. Each of these dataframes contains
#' a column corresponding to the database source.
#'
#' Additionally, n_unique dataframe contains two specific columns: 'cite_source' and 'cite_label', provided by the user.
#' 'cite_source' column represents the source of the citation and 'cite_label' represents another variable assigned to the
#' citation (normally search, screened, final).
#'
#' The function groups by 'cite_source' and filters by 'cite_label' == "search". These steps are essential for correctly
#' calculating unique records count.

@LukasWallrich
Copy link
Collaborator Author

LukasWallrich commented Jun 21, 2023 via email

@TNRiley
Copy link
Collaborator

TNRiley commented Jun 21, 2023

If the data is set up correctly, final and screened should not be sources but labels, and always be a subset of search - so that filtering for search should not do anything? For that reason, I would find it convenient if you could just upload data to shiny without having to label everything (but 2) as search. Would it make sense to use blank labels instead of search if search does not exist?

The labels would be set as search, screened, or final. We only want to see this information broken down by source, so the function is filtering for only the search records. If we didn't filter by search the table would pull the screened and final data too, which we don't want. Does this make sense, I'm not sure if I'm misunderstanding your point ;)

@LukasWallrich
Copy link
Collaborator Author

I forgot that the new calculate_record_counts depends on the raw citations, not just the deduplicated citations. However, would filtered and screened records not just have a blank source there, which we could filter out?

Typing in "search" many times just does not seem to be very user-friendly on Shiny - but if we can document clearly that this should be done (best on the page where users upload data so that this is correct pre-deduplication) and throw a warning when it is not present then that's good enough for now.

@TNRiley
Copy link
Collaborator

TNRiley commented Jun 23, 2023

I'm not sure I can think of a way around it currently. On the current web shiny there is an option to add a label before you click upload file. If we could add this function to the new shiny where someone could select multiple files and add a label to all of them at the same time, that would make it easy.

We'd still need to add the information on the upload page that lets users know that they should use "search" for source files, I think that would work.

captures_chrome-capture-2023-5-22

@TNRiley
Copy link
Collaborator

TNRiley commented May 30, 2024

@LukasWallrich just circling back to this one and I wanted to see if you still wanted to make adjustments here. Maybe we can run through some use case examples in the shiny once it is complete and see if this requires changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants