Skip to content

Instructions for uploading new datasets

Renxuanwang edited this page Feb 26, 2019 · 37 revisions

To upload a new dataset into Zenvisage, click on the "+" button on the left-hand "Dataset" panel.

Data file requirements

The data file should be in comma-separated value (CSV) format. Each row in this file corresponds to one measurement or observation, with many values, one for each feature or attribute for the observation. The first row in this file provides the names for each of these attributes. In our example, we have a weather dataset that consists of measurements of temperature as a function of time measured in months for each city. Each row in our data file contains one measurement for a city. Note that the data file should not contain missing values -- this will cause Zenvisage to return a server error. Dates should be formatted as separate year, month, and day-of-year columns as shown below, since our x-axis doesn't currently support non-numeric values.

location,month,dayofyear,year,temperature
SANFRAN,1,1,1995,46.7
SANFRAN,1,2,1995,47.3
SANFRAN,1,3,1995,49.6
...
CHICAGO,12,351,1995,28.6
CHICAGO,12,352,1995,34.5
CHICAGO,12,353,1995,32.7

Select CSV file

Click on the "Choose dataset file" button to select a CSV file. Next, type in the name of the dataset and click "Upload". Note that the dataset name can not contain any special character(e.g. "-", ".") that makes an invalid Postgres dataset name.

Select data attribute properties

After the CSV file selection, Zenvisage will automatically extract and interpret data from the CSV file to detect axis and data type information for each attribute in the header.

The first configuration parameter is the axis. The X values are independent variables, such as time. The Y values are the dependent or measure variables, such as sales or temperature. The Z values are categorical identifiers of a data row, such as product or city, that a user may want to explore various visualizations for.

By default, Zenvisage auto-selects the X, Y, Z axis by detecting "int" and "float" attributes as X and Y axis, and "string" attributes for the Z axis. The "Select All" checkbox for each axis selects all of attributes to be included in that axis. At any point, the user may select individual attributes to be included in a particular axis so the user has total control over axis assignment. For example, if the user wants to select only 1 attribute for an axis, the easiest approach would be to toggle "Select all" twice to clear the check boxes, and then find the attribute and select it manually.

The second configuration parameter is the data type. For each attribute, data types of "int", "float", or "string" from the dropdown menu will specify how Zenvisage should parse data in each attribute. The types are initialized as data types that are detected by Zenvisage.

Click "Submit" when finished.

View the uploaded dataset

A popup will show up after the dataset has been loaded successfully. To explore the new dataset, select the dataset under the dropdown menu in the "Dataset" panel.

Programmatic Dataset Upload

To integrate your existing data storage system with Zenvisage, you can also upload your dataset programmatically through a custom script as long as it meets the data file requirements above. This could be done by sending a POST request to zv/datasetUpload after you have started an active Zenvisage running. An example of Python script is shown below:

import requests
url = 'http://localhost:8080/zv/datasetUpload'
files = {'file': open('real_estate_tutorial.csv', 'rb')}
r = requests.post(url, files=files, data={'datasetName': 'xxx', 'overwrite': 'false'})
print(r.text)

You can specify whether you want to overwrite if there exists a table with the same dataset name. The default setting is overwrite = true. If overwrite is set to false, then the data in the file will be append to the table.

After refreshing the webpage, you will see the newly-uploaded dataset in the Dataset dropdown menu.