Skip to content

Commit

Permalink
merging changes from @kuriwaki
Browse files Browse the repository at this point in the history
ref #74

Merge branch 'dev' of https://github.com/IQSS/dataverse-client-r into dev

# Conflicts:
#	R/get_dataframe.R
#	man/get_dataframe.Rd
  • Loading branch information
wibeasley committed Jan 18, 2021
2 parents ce36291 + 1c5d6e8 commit e2012c7
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 65 deletions.
18 changes: 9 additions & 9 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ Authors@R: c(
email = "thosjleeper@gmail.com",
comment = c(ORCID = "0000-0003-4097-6326")
),
person(
"Philip", "Durbin",
role = c("aut"),
email = "philipdurbin@gmail.com",
comment = c(ORCID = "0000-0002-9528-9470")
),
person(
"Shiro", "Kuriwaki",
role = c("aut"),
email = "shirokuriwaki@gmail.com",
comment = c(ORCID = "0000-0002-5687-2647")
),
person(
"Philip", "Durbin",
role = c("aut"),
email = "philipdurbin@gmail.com",
comment = c(ORCID = "0000-0002-9528-9470")
),
person(
"Sebastian", "Karcher",
role=c("aut"),
Expand Down Expand Up @@ -53,9 +53,9 @@ Suggests:
testthat,
UNF,
yaml
Description: Provides access to Dataverse version 4 APIs <https://dataverse.org/>,
enabling data search, retrieval, and deposit. For Dataverse versions <= 4.0,
use the deprecated 'dvn' package <https://cran.r-project.org/package=dvn>.
Description: Provides access to Dataverse APIs <https://dataverse.org/> (versions 4-5),
enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0,
use the archived 'dvn' package <https://cran.r-project.org/package=dvn>.
License: GPL-2
LazyData: true
URL: https://github.com/iqss/dataverse-client-r
Expand Down
33 changes: 22 additions & 11 deletions R/get_dataframe.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
#' Get file from dataverse and convert it into a dataframe or tibble
#' Download dataverse file as a dataframe
#'
#' `get_dataframe_by_id`, if you know the numeric ID of the dataset, or instead
#' `get_dataframe_by_name` if you know the filename and doi. The dataset
#' Use `get_dataframe_by_name` if you know the name of the datafile and the DOI
#' of the dataset. Use `get_dataframe_by_doi` if you know the DOI of the datafile
#' itself. Use `get_dataframe_by_id` if you know the numeric ID of the
#' datafile.
#'
#' @rdname get_dataframe
#'
#' @param filename The name of the file of interest, with file extension, for example
#' `"roster-bulls-1996.tab"`.
#' @param .f The function to used for reading in the raw dataset. This user
#' must choose the appropriate function: for example if the target is a .rds
#' file, then `.f` should be `readRDS` or `readr::read_`rds`.
#' file, then `.f` should be `readRDS` or `readr::read_rds`.
#' @param original A logical, defaulting to TRUE. Whether to read the ingested,
#' archival version of the dataset if one exists. The archival versions are tab-delimited
#' archival version of the datafile if one exists. The archival versions are tab-delimited
#' `.tab` files so if `original = FALSE`, `.f` is set to `readr::read_tsv`.
#' If functions to read the original version is available, then `original = TRUE`
#' with a specified `.f` is better.
Expand All @@ -20,15 +22,15 @@
#'
#' @examples
#' # Retrieve data.frame from dataverse DOI and file name
#' df_from_rds_ingested <-
#' df_tab <-
#' get_dataframe_by_name(
#' filename = "roster-bulls-1996.tab",
#' dataset = "doi:10.70122/FK2/HXJVJU",
#' server = "demo.dataverse.org"
#' )
#'
#' # Retrieve the same data.frame from dataverse + file DOI
#' df_from_rds_ingested_by_doi <-
#' # Retrieve the same file from file DOI
#' df_tab <-
#' get_dataframe_by_doi(
#' filedoi = "10.70122/FK2/HXJVJU/SA3Z2V",
#' server = "demo.dataverse.org"
Expand All @@ -45,11 +47,10 @@
#' server = "demo.dataverse.org"
#' )
#'
#'
#' # To use the original file version, or for non-ingested data,
#' # please specify `original = TRUE` and specify a function in .f.
#'
#' # A data.frame is still returned, but the
# Rds files are not ingested so original = TRUE and .f is required.
#' if (requireNamespace("readr", quietly = TRUE)) {
#' df_from_rds_original <-
#' get_dataframe_by_name(
Expand All @@ -61,8 +62,9 @@
#' )
#' }
#'
#' # Get Stata file as original
#' if (requireNamespace("haven", quietly = TRUE)) {
#' df_from_stata_original <-
#' df_stata_original <-
#' get_dataframe_by_name(
#' filename = "nlsw88.tab",
#' dataset = "doi:10.70122/FK2/PPIAXE",
Expand All @@ -71,6 +73,15 @@
#' .f = haven::read_dta
#' )
#' }
#'
#' # Stata file as ingested file (less information than original)
#' df_stata_ingested <-
#' get_dataframe_by_name(
#' filename = "nlsw88.tab",
#' dataset = "doi:10.70122/FK2/PPIAXE",
#' server = "demo.dataverse.org"
#' )
#'
#' }
#'
#' @export
Expand Down
19 changes: 10 additions & 9 deletions R/get_file.R
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
#' @rdname files
#'
#' @title Download File
#' @title Download dataverse file as a raw binary
#'
#' @description Download Dataverse File(s). `get_file` is a general wrapper,
#' and can take either dataverse objects, file IDs, or a filename and dataverse.
#' @description Download Dataverse File(s). `get_file_*`
#' functions return a raw binary file, which cannot be readily analyzed in R.
#' To use the objects as dataframes, see the `get_dataset_*` functions at
#' \link{get_dataset} instead.
#'
#' @details This function provides access to data files from a Dataverse entry.
#' `get_file` is a general wrapper,
#' and can take either dataverse objects, file IDs, or a filename and dataverse.
#' Internally, all functions download each file by `get_file_by_id`.
#' `get_file_by_name` is a shorthand for running `get_file` by
#' specifying a file name (`filename`) and dataset (`dataset`).
#' `get_file_by_doi` obtains a file by its file DOI, bypassing the
#' `dataset` argument.
#'
#' Internally, all functions download each file by `get_file_by_id`. `get_file_*`
#' functions return a raw binary file, which cannot be readily analyzed in R.
#' To use the objects as dataframes, see the `get_dataset_*` functions at \link{get_dataset}
#'
#' @details This function provides access to data files from a Dataverse entry.
#'
#' @param file An integer specifying a file identifier; or a vector of integers
#' specifying file identifiers; or, if used with the prefix \code{"doi:"}, a
#' character with the file-specific DOI; or, if used without the prefix, a
Expand Down
6 changes: 3 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "R Client for Dataverse 4 Repositories"
title: "R Client for Dataverse Repositories"
output: github_document
---

Expand All @@ -13,7 +13,7 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

[![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org)

The **dataverse** package provides access to [Dataverse 4](https://dataverse.org/) APIs, enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting.
The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4-5), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting.

### Getting Started

Expand All @@ -32,7 +32,7 @@ library("dataverse")

#### Keys

Some features of the Dataverse 4 API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using:
Some features of the Dataverse API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using:

``` r
Sys.setenv("DATAVERSE_KEY" = "examplekey12345")
Expand Down
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
R Client for Dataverse 4 Repositories
R Client for Dataverse Repositories
================

[![CRAN
Expand All @@ -12,10 +12,10 @@ Status](https://travis-ci.org/IQSS/dataverse-client-r.png?branch=master)](https:
logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org)

The **dataverse** package provides access to
[Dataverse 4](https://dataverse.org/) APIs, enabling data search,
retrieval, and deposit, thus allowing R users to integrate public data
sharing into the reproducible research workflow. **dataverse** is the
next-generation iteration of [the **dvn**
[Dataverse](https://dataverse.org/) APIs (versions 4-5), enabling data
search, retrieval, and deposit, thus allowing R users to integrate
public data sharing into the reproducible research workflow.
**dataverse** is the next-generation iteration of [the **dvn**
package](https://cran.r-project.org/package=dvn), which works with
Dataverse 3 (“Dataverse Network”) applications. **dataverse** includes
numerous improvements for data search, retrieval, and deposit, including
Expand All @@ -34,7 +34,7 @@ library("dataverse")

#### Keys

Some features of the Dataverse 4 API are public and require no
Some features of the Dataverse API are public and require no
authentication. This means in many cases you can search for and retrieve
data without a Dataverse account for that a specific Dataverse
installation. But, other features require a Dataverse account for the
Expand Down Expand Up @@ -257,13 +257,12 @@ subsequent pages, specify `start`.

### Data Archiving

Dataverse provides two - basically unrelated - workflows for managing
(adding, documenting, and publishing) datasets. The first is built on
[SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a
new dataset listing, you will have to first initialize a dataset entry with
some metadata, add one or more files to the dataset, and then publish
it. This looks something like the following:

Dataverse provides two - basically unrelated - workflows for managing
(adding, documenting, and publishing) datasets. The first is built on
[SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a
new dataset listing, you will have to first initialize a dataset entry
with some metadata, add one or more files to the dataset, and then
publish it. This looks something like the following:

``` r
# retrieve your service document
Expand Down
19 changes: 10 additions & 9 deletions man/files.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 21 additions & 11 deletions man/get_dataframe.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit e2012c7

Please sign in to comment.