Skip to content

Latest commit

 

History

History
161 lines (117 loc) · 4.95 KB

README.md

File metadata and controls

161 lines (117 loc) · 4.95 KB

Dataverse Uploader
PyPI version Build Badge Build Badge

Python equivalent to the DVUploader written in Java. Complements other libraries written in Python and facilitates the upload of files to a Dataverse instance via Direct Upload.

Features

  • Parallel direct upload to a Dataverse backend storage
  • Files are streamed directly instead of being buffered in memory
  • Supports multipart uploads and chunks data accordingly

DVUploader.mov

Getting started

To get started with DVUploader, you can install it via PyPI

python3 -m pip install dvuploader

or by source

git clone https://github.com/gdcc/python-dvuploader.git
cd python-dvuploader
python3 -m pip install .

Quickstart

Programmatic usage

In order to perform a direct upload, you need to have a Dataverse instance running and a cloud storage provider. The following example shows how to upload files to a Dataverse instance. Simply provide the files of interest and utilize the upload method of a DVUploader instance.

import dvuploader as dv


# Add file individually
files = [
    dv.File(filepath="./small.txt"),
    dv.File(directory_label="some/dir", filepath="./medium.txt"),
    dv.File(directory_label="some/dir", filepath="./big.txt"),
    *dv.add_directory("./data"), # Add an entire directory
]

DV_URL = "https://demo.dataverse.org/"
API_TOKEN = "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
PID = "doi:10.70122/XXX/XXXXX"

dvuploader = dv.DVUploader(files=files)
dvuploader.upload(
    api_token=API_TOKEN,
    dataverse_url=DV_URL,
    persistent_id=PID,
    n_parallel_uploads=2, # Whatever your instance can handle
)

Command Line Interface

DVUploader ships with a CLI ready to use outside scripts. In order to upload files to a Dataverse instance, simply provide the files of interest, persistent identifier and credentials.

Using arguments

dvuploader my_file.txt my_other_file.txt \
           --pid doi:10.70122/XXX/XXXXX \
           --api-token XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX \
           --dataverse-url https://demo.dataverse.org/ \

Using a config file

Alternatively, you can also supply a config file that contains all necessary information for the uploader. The config file is a JSON/YAML file that contains the following keys:

  • persistent_id: Persistent identifier of the dataset to upload to.
  • dataverse_url: URL of the Dataverse instance.
  • api_token: API token of the Dataverse instance.
  • files: List of files to upload. Each file is a dictionary with the following keys:
    • filepath: Path to the file to upload.
    • directory_label: Optional directory label to upload the file to.
    • description: Optional description of the file.
    • mimetype: Mimetype of the file.
    • categories: Optional list of categories to assign to the file.
    • restrict: Boolean to indicate that this is a restricted file. Defaults to False.

In the following example, we upload three files to a Dataverse instance. The first file is uploaded to the root directory of the dataset, while the other two files are uploaded to the directory some/dir.

# config.yml
persistent_id: doi:10.70122/XXX/XXXXX
dataverse_url: https://demo.dataverse.org/
api_token: XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
files:
    - filepath: ./small.txt
    - filepath: ./medium.txt
      directory_label: some/dir
    - filepath: ./big.txt
      directory_label: some/dir

The config file can then be used as follows:

dvuploader --config-path config.yml

Development

To install the development dependencies, run the following command:

pip install poetry
poetry install --with test

Running tests locally

In order to test the DVUploader, you need to have a Dataverse instance running. You can start a local Dataverse instance by following these steps:

1. Start the Dataverse instance

docker compose \
    -f ./docker/docker-compose-base.yml \
    --env-file local-test.env \
    up -d

2. Set up the environment variables

export BASE_URL=http://localhost:8080
export $(grep "API_TOKEN" "dv/bootstrap.exposed.env")
export DVUPLOADER_TESTING=true

3. Run the test(s) with pytest

python -m pytest -v

Linting

This repository uses ruff to lint the code and codespell to check for spelling mistakes. You can run the linters with the following command:

python -m ruff check
python -m codespell --check-filenames