Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the README #24

Closed
wlandau opened this issue Jul 1, 2018 · 6 comments
Closed

Refactoring the README #24

wlandau opened this issue Jul 1, 2018 · 6 comments

Comments

@wlandau
Copy link

wlandau commented Jul 1, 2018

Currently, the README and vignette have almost identical content. I recommend diverging them because

  1. Maintaining 2 copies of the same writeup is hard, and
  2. These documents have different purposes. As I understand it, the README should be a high-level first exposure for new users, and the vignettes should go into usage details.

Suggested section outline

  • Why DataPackageR?
  • Installation
  • Minimal example
  • Preprint and publication
  • Code of conduct

Suggested motivation section

I think it would be helpful to take a couple steps back here. Maybe it could flow like this:

Suggested minimal example

library(DataPackageR)
library(data.tree)

# Let's reproducibly package up
# the cars in the mtcars dataset
# with speed > 20.
# Our dataset will be called cars_over_20.

# Get the code file that turns the raw data
# to our packaged and processed analysis-ready dataset.
processing_code <- system.file(
  "extdata", "tests", "subsetCars.Rmd", package = "DataPackageR"
)

# Create the package framework.
DataPackageR::datapackage.skeleton(
  "mtcars20", force=TRUE, code_files = processing_code, r_object_names = "cars_over_20") 

# Run the preprocessing code to build cars_over_20
# and reproducibly enclose it in a package.
DataPackageR:::package_build("mtcars20") 

# Let's use the package we just created.
install.packages("mtcars20_1.0.tar.gz", type = "source", repos = NULL)
library(mtcars20)
data("cars_over_20") # load the data
cars_over_20  # Now we can use it.
?cars_over_20 # See the documentation you wrote in data-raw/documentation.R.

# We have our dataset!
# Since we preprocessed it,
# it is clean and under the 5 MB limit for data in packages.
cars_over_20

# We can easily check the version of the data
DataPackageR::dataVersion("mtcars20")

If you include the output of this code chunk, I would encourage suppressing the most verbose messages. For example, maybe set quiet to TRUE in knit() or render().

Consider README.Rmd

It is becoming standard practice to generate the README.md from a README.Rmd (example here). It is much easier this way to keep code chunk output synchronized with everything else. usethis::use_readme_rmd() makes it easy.

@wlandau
Copy link
Author

wlandau commented Jul 2, 2018

Also, I think this code chunk could point to raw-data/documentation.R. I updated the example code chunk above.

# Let's use the package we just created.
install.packages("mtcars20_1.0.tar.gz", type = "source", repos = NULL)
library(mtcars20)
data("cars_over_20") # load the data
cars_over_20  # Now we can use it.
?cars_over_20 # See the documentation you wrote in data-raw/documentation.R.

@wlandau
Copy link
Author

wlandau commented Jul 2, 2018

To be clear, the suggested structure in #24 (comment) is just meant to give you ideas, not force you into writing a certain way. You may also find it helpful to check out the official rOpenSci guidance on READMEs and this thread on vignette standards.

@wlandau
Copy link
Author

wlandau commented Jul 2, 2018

Ref: ropensci/software-review#230

@wlandau
Copy link
Author

wlandau commented Jul 6, 2018

You did a tone of work on the new README, and I really like it. It is neatly outlined and concise, and it operates at the right level of detail. Right under "Why package data sets?", would you consider adding an explicit definition of a data package? Maybe something like "A data package is a formal R package whose sole purpose is to contain, access, and/or document datasets."

@gfinak
Copy link
Member

gfinak commented Jul 8, 2018

Glad to. Will be back in front of a computer by Monday, will deal with it then.
Thanks again.

gfinak added a commit that referenced this issue Jul 9, 2018
- move YAML_CONFIG to vignettes.
Issue #24
- Add definition of data package to README.
Issue #25
- Move "R CMD build" to section after package_build is introduced.
- Extend the "Purpose" section a bit.
- Extended "Next Steps" and made it a sub-section.
- Referenced "Happy Git and Github for the useR" and Hadley's book on R packages.
- Fix typo mtcars2 to mtcars20
@wlandau
Copy link
Author

wlandau commented Jul 10, 2018

Very helpful, thank you.

@wlandau wlandau closed this as completed Jul 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants