-
Notifications
You must be signed in to change notification settings - Fork 19
/
DataPackageR-package.Rd
120 lines (103 loc) · 4.6 KB
/
DataPackageR-package.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DataPackageR-package.R
\docType{package}
\name{DataPackageR-package}
\alias{DataPackageR}
\alias{DataPackageR-package}
\title{DataPackageR}
\description{
A framework to automate the processing, tidying and packaging of raw data into analysis-ready
data sets as R packages.
}
\details{
DataPackageR will automate running of data processing code,
storing tidied data sets in an R package, producing
data documentation stubs, tracking data object finger prints (md5 hash)
and tracking and incrementing a "DataVersion" string
in the DESCRIPTION file of the package when raw data or data
objects change.
Code to perform the data processing is passed to DataPackageR by the user.
The user also specifies the names of the tidy data objects to be stored,
documented and tracked in the final package. Raw data should be read from
"inst/extdata" but large raw data files can be read from sources external
to the package source tree.
Configuration is controlled via the config.yml file created at the package root.
Its properties include a list of R and Rmd files that are to be rendered / sourced and
which read data and do the actual processing.
It also includes a list of r object names created by those files. These objects
are stored in the final package and accessible via the \code{data()} API.
The documentation for these objects is accessible via "?object-name", and md5
fingerprints of these objects are created and tracked.
The Rmd and R files used to process the objects are transformed into vignettes
accessible in the final package so that the processing is fully documented.
A DATADIGEST file in the package source keeps track of the data object fingerprints.
A DataVersion string is added to the package DESCRIPTION file and updated when these
objects are updated or changed on subsequent builds.
Once the package is built and installed, the data objects created in the package are accessible via
the \code{data()} API, and
Calling \code{datapackage_skeleton()} and passing in R / Rmd file names, and r object names
constructs a skeleton data package source tree and an associated \code{config.yml} file.
Calling \code{package_build()} sets the build process in motion.
}
\examples{
# A simple Rmd file that creates one data object
# named "tbl".
if(rmarkdown::pandoc_available()){
f <- tempdir()
f <- file.path(f,"foo.Rmd")
con <- file(f)
writeLines("```{r}\n tbl = data.frame(1:10) \n```\n",con=con)
close(con)
# construct a data package skeleton named "MyDataPackage" and pass
# in the Rmd file name with full path, and the name of the object(s) it
# creates.
pname <- basename(tempfile())
datapackage_skeleton(name=pname,
path=tempdir(),
force = TRUE,
r_object_names = "tbl",
code_files = f)
# call package_build to run the "foo.Rmd" processing and
# build a data package.
package_build(file.path(tempdir(), pname), install = FALSE)
# "install" the data package
devtools::load_all(file.path(tempdir(), pname))
# read the data version
data_version(pname)
# list the data sets in the package.
data(package = pname)
# The data objects are in the package source under "/data"
list.files(pattern="rda", path = file.path(tempdir(),pname,"data"), full = TRUE)
# The documentation that needs to be edited is in "/R"
list.files(pattern="R", path = file.path(tempdir(), pname,"R"), full = TRUE)
readLines(list.files(pattern="R", path = file.path(tempdir(),pname,"R"), full = TRUE))
# view the documentation with
?tbl
}
}
\seealso{
Useful links:
\itemize{
\item \url{https://github.com/ropensci/DataPackageR/}
\item \url{https://docs.ropensci.org/DataPackageR/}
\item Report bugs at \url{https://github.com/ropensci/DataPackageR/issues}
}
}
\author{
\strong{Maintainer}: Dave Slager \email{dslager@scharp.org} (\href{https://orcid.org/0000-0003-2525-2039}{ORCID}) [contributor]
Authors:
\itemize{
\item Greg Finak \email{greg.finak@gmail.com} (Original author and creator of DataPackageR) [copyright holder]
}
Other contributors:
\itemize{
\item Paul Obrecht [contributor]
\item Ellis Hughes \email{ellishughes@live.com} (\href{https://orcid.org/0000-0003-0637-4436}{ORCID}) [contributor]
\item Jimmy Fulp \email{williamjfulp@gmail.com} [contributor]
\item Marie Vendettuoli (\href{https://orcid.org/0000-0001-9321-1410}{ORCID}) [contributor]
\item Jason Taylor \email{jmtaylor@fredhutch.org} [contributor]
\item Kara Woo (Kara reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/230>) [reviewer]
\item William Landau (William reviewed the package for ropensci, see <https://github.com/ropensci/onboarding/issues/230>) [reviewer]
}
}
\keyword{internal}