Skip to content

User_Dataset Management

Marc-Alexandre Côté edited this page Dec 6, 2018 · 9 revisions

My datasets

When you create a challenge, all datasets (and code) are stored in "My Datasets" automatically. But you can also upload data manually and reference it in your competition.

Quick start

When you are logged into Codalab, go to My Competitions>My Datasets>Create Dataset. Fill out the form:

You can upload different types of "data" (or code):

  • Public data: Will be made visible for download.
  • Input data: Will be made available to code submissions (but not be downloadable).
  • Reference data: Will be made available to the scoring program (but not visible to participants nor to their code).
  • Ingestion program: Organizer-provided code that will be run when code submissions are made.
  • Scoring program: Organizer-provided code rating the predictions against the solutions.
  • Starting kit: Organizer-provided kit, which may include sample code and sample submissions.

Upload the your file DataName.zip. After you upload, you should see your new dataset in the data table. The KEY can be used to refer to it from the YAML file, e.g.

public_data: dac49905-dda0-4857-922a-02ca957ec8fd

You can also use the editor. In Competitions I’m running, find your competition and click “Edit”:

Find the menus for Input Data, Reference Data, Public Data, Ingestion Program, Scoring Program, and Starting Kit. Select the right dataset, as don’t forget to SAVE YOUR CHANGES.

Creating datasets

  1. Login
  2. Click on "My Datasets" in the top right near your username
  3. Click the "Create dataset" button in the top left of the content area
  4. Fill in all of the relevant information
  5. Click the "Upload" button
  6. If everything was successful, the dataset should appear in the list of datasets

Removing datasets

  1. Login
  2. Click on "My Datasets" in the top right near your username
  3. Find the dataset you want to delete, click the "DEL" button on the right
  4. Confirm that you want to delete it -- the dataset may already be in use in a competition, which you should be warned about.

Downloading datasets

  1. Login
  2. Click on "My Datasets" in the top right near your username
  3. Click the "Download" button to the right of the dataset you want to download

Using datasets in YAML file

For public_data, input_data, reference_data, ingestion_program, or scoring_program you change the file name to the UUID of the dataset. For example:

phases:
  0:
    phasenumber: 0
    reference_data: af5e8c26-73b0-485a-a8b9-a572dd88d828
    scoring_program: 21a8f881-2e53-4c71-b841-47e78e0b4040
    input_data: few2f881-2rp3-4221-b121-j4kj3934jt42

Warning: OLD WAY OF representing public data, you should use public-data instead of this:

    datasets:
      1:
        name: Training Data
        key: 21a8f881-2e53-4c71-b841-47e78e0b4040
        description: Training data

Switching datasets with the editor

  1. Login
  2. Go to "My Codalab"
  3. Go to "Competitions I'm running" tab
  4. Find the competition you want to edit, click "Edit"
  5. Scroll to the phase you want to modify and you can select a new public_data, input_data, reference_data, ingestion_program, or scoring_program from datasets you have uploaded.
  6. Click "Save" to complete

NOTE: You cannot change Dataset entries with uploading a new competition.yaml, this is a limitation of the current Competition Edit Form

Clone this wiki locally