Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cancerprof: API Client for extracting data from State Cancer Profiles #637

Open
13 of 29 tasks
realbp opened this issue Apr 3, 2024 · 19 comments
Open
13 of 29 tasks
Assignees

Comments

@realbp
Copy link

realbp commented Apr 3, 2024

Submitting Author Name: Brian Park
Submitting Author Github Handle: @realbp
Repository: https://github.com/getwilds/cancerprof
Version submitted: 0.1.0
Submission type: Standard
Editor: @ldecicco-USGS
Reviewers: @jromanowska

Due date for @jromanowska: 2024-05-20

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: cancerprof
Title: API Client for State Cancer Profiles
Version: 0.1.0
Authors@R: 
    person("Brian", "Park", , "joon.brianpark@gmail.com", role = c("aut", "cre"),
           comment = c(ORCID = "0009-0008-8274-3057"))
Description: An interface for retrieving data from the NIH NCI State Cancer Profiles API <https://statecancerprofiles.cancer.gov/>. State Cancer Profiles provides information about data topics including demographics, screening and risk factors, cancer incidence, and mortality for US states, counties, and health service areas.
License: MIT + file LICENSE
URL: https://github.com/getwilds/cancerprof, https://getwilds.org/cancerprof/
BugReports: https://github.com/getwilds/cancerprof/issues
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Imports: 
    cdlTools,
    cli,
    dplyr,
    httr2,
    magrittr,
    rlang,
    stringr,
    utils
Suggests: 
    knitr,
    rmarkdown,
    testthat
Config/testthat/edition: 3
VignetteBuilder: knitr
Depends: 
    R (>= 2.10)

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

cancerprof allows users to retrieve data from State Cancer Profiles for programmable analysis. cancerprof makes accessing the undocumented API from State Cancer Profiles intuitive and easy.

  • Who is the target audience and what are scientific applications of this package?

The target audience for cancerprof is anyone who wants to access data from state cancer profiles to conduct programmable analysis without having to navigate the complex nature of its GUI. Specifically, cancer researchers could use cancerprof to conduct reproducable analysis of cancer crossed references with a variety of topics found within the data from state cancer profiles.

Currently there are no other softwares or packages that extracts the publicly available data from State Cancer Profiles.

Cancerprof does not breach any data privacy laws and complies with the ethics policies of ropensci.

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#635

  • Explain reasons for any pkgcheck items which your package is unable to pass.

Cancerprof passes all pkgcheck items

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for cancerprof (v0.1.0)

git hash: 36706151

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 97.9%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 232
internal cancerprof 66
internal stats 28
internal graphics 8
imports magrittr 122
imports dplyr 64
imports cli 55
imports rlang 25
imports httr2 8
imports utils 2
imports cdlTools 1
imports stringr 1
suggests knitr NA
suggests rmarkdown NA
suggests testthat NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (76), c (56), structure (28), body (20), class (12), url (9), options (8), paste0 (8), new.env (5), as.raw (4), date (4), which (2)

magrittr

%>% (122)

cancerprof

create_request (20), demo_crowding (1), demo_education (1), demo_food (1), demo_income (1), demo_insurance (1), demo_language (1), demo_mobility (1), demo_population (1), demo_poverty (1), demo_svi (1), demo_workforce (1), dput_resp_demo (1), dput_resp_incd (1), dput_resp_mortality (1), dput_resp_risk (1), fips_scp (1), get_area (1), handle_age (1), handle_alcohol (1), handle_cancer (1), handle_crowding (1), handle_datatype (1), handle_diet_exercise (1), handle_education (1), handle_food (1), handle_income (1), handle_insurance (1), handle_mobility (1), handle_non_english (1), handle_population (1), handle_poverty (1), handle_race (1), handle_screening (1), handle_sex (1), handle_smoking (1), handle_stage (1), handle_svi (1), handle_vaccine (1), handle_women_health (1), handle_workforce (1), handle_year (1), incidence_cancer (1), mortality_cancer (1), process_resp (1), risk_alcohol (1), risk_colorectal_screening (1)

dplyr

across (26), mutate (26), mutate_all (4), all_of (3), na_if (3), filter (2)

cli

cli_abort (55)

stats

setNames (28)

rlang

is_na (23), sym (2)

graphics

frame (8)

httr2

request (8)

utils

data (1), read.csv (1)

cdlTools

fips (1)

stringr

str_pad (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 54 files) and
  • 1 authors
  • 4 vignettes
  • no internal data file
  • 8 imported packages
  • 19 exported functions (median 26 lines of code)
  • 83 non-exported functions in R (median 22 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 54 96.4
files_vignettes 4 95.3
files_tests 45 99.0
loc_R 2611 88.6
loc_vignettes 492 77.7
loc_tests 1533 91.7
num_vignettes 4 96.6 TRUE
n_fns_r 102 77.0
n_fns_r_exported 19 65.9
n_fns_r_not_exported 83 80.0
n_fns_per_file_r 1 0.2 TRUE
num_params_per_fn 4 54.6
loc_per_fn_r 22 65.5
loc_per_fn_r_exp 26 57.4
loc_per_fn_r_not_exp 22 66.9
rel_whitespace_R 9 75.4
rel_whitespace_vignettes 39 83.5
rel_whitespace_tests 12 84.2
doclines_per_fn_exp 56 69.3
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 124 82.6

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
8546148980 pages build and deployment success 13c6cf 6 2024-04-03
8546127063 pkgdown success 367061 17 2024-04-03
8546127064 R-CMD-check success 367061 181 2024-04-03
8546127060 test-coverage success 367061 19 2024-04-03

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. cyclocomp

Test coverage with covr

Package coverage: 97.93

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
risk_smoking 106
demo_population 70
incidence_cancer 34
mortality_cancer 29
demo_insurance 27
demo_poverty 27
risk_colorectal_screening 21
risk_women_health 19
demo_education 18

Static code analyses with lintr

lintr found the following 102 potential issues:

message number of times
Avoid using sapply, consider vapply instead, that's type safe 24
Lines should not be more than 80 characters. 78


Package Versions

package version
pkgstats 0.1.3.11
pkgcheck 0.1.2.21


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@ldecicco-USGS
Copy link

@ropensci-review-bot assign @ldecicco-USGS as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @ldecicco-USGS is now the editor

@ldecicco-USGS
Copy link

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • Is the case for the package well made?
    • Is the reference index page clear (grouped by topic if necessary)?
    • Are vignettes readable, sufficiently detailed and not just perfunctory?
  • Fit: The package meets criteria for fit and overlap.
  • Installation instructions: Are installation instructions clear enough for human users?
  • Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • License: The package has a CRAN or OSI accepted license.
  • Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

There could be more information added to the README, although the bare minimum to meet our criteria is there.

In the examples I tried, my first thought was it might be nice to convert some of the text. For instance:

x <- demo_income(
  area = "usa",
  areatype = "state",
  income = "median family income",
  race = "all races (includes hispanic)"
)
head(x$Rank)
[1] "52 of 52" "51 of 52" "50 of 52" "49 of 52" "48 of 52" "47 of 52"

Seems like c(52, 51, 50, etc) would be a more useful output to an R user. You'd probably want/need another column or something to give the user the " of 52". Not mandatory, could be handy though (maybe a simple function to offer users outside of the function? or a simple example within the examples for how to extract the rank number).


@ldecicco-USGS
Copy link

@ropensci-review-bot seeking reviewers

@ropensci-review-bot
Copy link
Collaborator

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/637_status.svg)](https://github.com/ropensci/software-review/issues/637)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@realbp
Copy link
Author

realbp commented Apr 17, 2024

Thank you for the feedback! I will make those changes in the upcoming version of cancerprof. I have added the ropensci badge and created a NEWS.md file.

What are the next steps in the review process?

@ldecicco-USGS
Copy link

I'm asking around to find 2 reviewers. Hopefully that shouldn't take too long!

@realbp
Copy link
Author

realbp commented Apr 17, 2024

Great, thank you for a speedy response!

@ldecicco-USGS
Copy link

@ropensci-review-bot assign @jromanowska as reviewer

@ropensci-review-bot
Copy link
Collaborator

@jromanowska added to the reviewers list. Review due date is 2024-05-20. Thanks @jromanowska for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@jromanowska: If you haven't done so, please fill this form for us to update our reviewers records.

@jromanowska
Copy link

Hi! Just for your information: I'll start with the review soon. There are many free days in May here, in Norway, but I hope I will not need any extension of the review deadline. 🤞

@jromanowska
Copy link

@ldecicco-USGS , I just wanted to notify that {goodpractice} package that is dependency for {pkgcheck} has been archived by CRAN (https://cran.r-project.org/web//packages/goodpractice/index.html) so I couldn't install {pkgcheck} and had to install the GitHub version of {goodpractice} by hand.

@jromanowska
Copy link

Hi, I'm having problems installing the package:

pak::pak("getwilds/cancerprof")
#> Error: ! error in pak subprocess
#> Caused by error: 
#> ! Could not solve package dependencies:
#> * getwilds/cancerprof: ! pkgdepends resolution error for getwilds/cancerprof.
#> Caused by error: 
#> ! Bad GitHub credentials, make sure that your GitHub token is valid.
#> Caused by error in `stop(http_error(resp))`:
#> ! Unauthorized (HTTP 401).
devtools::install_github("getwilds/cancerprof")
#> Downloading GitHub repo getwilds/cancerprof@HEAD
#> Installing 3 packages: terra, raster, cdlTools
#> Installing packages into ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3’
#> (as ‘lib’ is unspecified)
#> trying URL 'https://cloud.r-project.org/src/contrib/terra_1.7-71.tar.gz'
#> Content type 'application/x-gzip' length 836573 bytes (816 KB)
#> ==================================================
#> downloaded 816 KB
#> 
#> trying URL 'https://cloud.r-project.org/src/contrib/raster_3.6-26.tar.gz'
#> Content type 'application/x-gzip' length 576421 bytes (562 KB)
#> ==================================================
#> downloaded 562 KB
#> 
#> trying URL 'https://cloud.r-project.org/src/contrib/cdlTools_1.13.tar.gz'
#> Content type 'application/x-gzip' length 43089 bytes (42 KB)
#> ==================================================
#> downloaded 42 KB
#> 
#> * installing *source* package ‘terra’ ...
#> ** package ‘terra’ successfully unpacked and MD5 sums checked
#> ** using staged installation
#> configure: CC: gcc
#> configure: CXX: g++ -std=gnu++17
#> checking for gdal-config... no
#> no
#> configure: error: gdal-config not found or not executable.
#> ERROR: configuration failed for package ‘terra’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/terra’
#> ERROR: dependency ‘terra’ is not available for package ‘raster’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/raster’
#> ERROR: dependencies ‘raster’, ‘terra’ are not available for package ‘cdlTools’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/cdlTools’
#> 
#> The downloaded source packages are in
#> 	‘/tmp/RtmpyrNDzF/downloaded_packages’
#> ── R CMD build ─────────────────────────────────────────────────────────#> ──────────────────────────────────────────────────
#> ✔  checking for file ‘/tmp/RtmpyrNDzF/remotes24d8468ec6bf/getwilds-cancerprof-23dbd98/DESCRIPTION’ ...
#> ─  preparing ‘cancerprof’:
#> ✔  checking DESCRIPTION meta-information
#> ─  checking for LF line-endings in source and make files and shell scripts
#> ─  checking for empty or unneeded directories
#> ─  looking to see if a ‘data/datalist’ file should be added
#> ─  building ‘cancerprof_0.1.0.tar.gz’
#>    Warning: invalid uid value replaced by that for user 'nobody'
#>    
#> Installing package into ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3’
#> (as ‘lib’ is unspecified)
#> ERROR: dependency ‘cdlTools’ is not available for package ‘cancerprof’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/cancerprof’
#> Warning messages:
#> 1: In i.p(...) : installation of package ‘terra’ had non-zero exit status
#> 2: In i.p(...) : installation of package ‘raster’ had non-zero exit status
#> 3: In i.p(...) :
#>   installation of package ‘cdlTools’ had non-zero exit status
#> 4: In i.p(...) :
#>   installation of package ‘/tmp/RtmpyrNDzF/file24d821d2b458/cancerprof_0.1.0.tar.gz’ had non-zero exit status
#> 
#> 

Created on 2024-04-30 with reprex v2.1.0

@ldecicco-USGS
Copy link

Do you get the same error if you install terra and raster independently?

install.packages(c("terra", "raster"))

@jromanowska
Copy link

Today I've tried on another computer (also Linux) and the pak command worked - it actually showed me which system libraries were missing 🤔
After installing those, I could re-run the pak without problems. I will write some comments about this installation process and issues in my review, so that other users may be aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants