Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Docs: Loading Data

Adil edited this page Jul 4, 2016 · 2 revisions

Data Sources

You can load data to Keshif from :

  • Google Sheets
  • Text File
  • On Google Drive
  • On Dropbox
  • File on your webserver

Text File Types

Keshif can be used with the following data file types:

  • CSV / TSV
  • JSON
  • XML
  • Any other file type that you can load and parse in JavaScript. See Custom Data Loading

Hint: The dataset explorer at the frontpage indexes demos by file type and resource. Filter by data source to find example source code on how to apply a specific file loading approach.

Note: You cannot currently use multiple data sources or file types in a browser unless you use custom data loading function.

Usage Options

To load data to keshif, you can

  1. Use the API to create a keshif browser, or
  2. Use the experimental browser authoring page.

Using the API

Each browser specification object must have a "source" key that describes the data for the keshif browser.

Examples

// Using Google Sheets, single table (sheet)
source: {
  gdocId: '0Ai6LdDWgaqgNdFlZRk83NmpDLVc2cllCRjhpdkNYOWc',
  tables: "Demos"
}

// Using Google Sheets, single table (sheet), with a custom link source URL
source: {
  url: "http://www.bloomberg.com/infographics/2014-08-21/top-data-breaches.html",
  gdocId: '14vd0RHPy-JyetjppxJ4R5UywaeszV0HR599MX91KkjI',
  tables: "Breaches"
}

// Using Google Sheets, multiple tables, focusing on Publications
source: {
  gdocId: '0Ai6LdDWgaqgNdEp1aHBzSTg0T0RJVURqWVNGOGNkNXc',
  sheets: [ "Publications", "Venues", "Authors", "Keywords", "VenueTypes", "AuthorTypes" ]
}

// Using custom callback
source: { callback: function(browser){ ... } },

// Using Google Sheets, table with customized id column
source: {
  gdocId: '1zmtJuAfh2foJD1Ha4Ppiuisq4Wx5DDUt61zwiCjf500',
  tables: {name:"Companies", id:'Stock'}
}

// Using locally hosted JSON file, with custom URL for attribution on interface
source: {
  url: "http://www.chromestatus.com/features",
  dirPath: "./data/",
  fileType: 'json',
  tables: "chromefeatures"
}

// Using Google Drive hosted csv file
source: {
  url: 'http://www.consumerfinance.gov/complaintdatabase',
  dirPath: 'https://ca480fa8cd553f048c65766cc0d0f07f93f6fe2f.googledrive.com/host/0By6LdDWgaqgNfmpDajZMdHMtU3FWTEkzZW9LTndWdFg0Qk9MNzd0ZW9mcjA4aUJlV0p1Zk0/',
  fileType: 'csv',
  tables: "Consumer_Complaints_2015"
}

// Using Dropbox hosted json file
source: {
  url: "http://ccnmtl.columbia.edu/portfolio/exhibit_view.html",
  dirPath: 'https://dl.dropboxusercontent.com/u/1951639/',
  fileType: 'json',
  tables: "cnmtl_portfolio"
}

Specifying data tables

Data tables are described using tables key in your source object. The tables can be a single table description, or an array of descriptions.

Note: Earlier, the key name was sheets instead of tables. Both keys are currently supported to describe the table data, however, in future, sheets key support may be removed.

If you only specify a string as data description, it is used as the table name parameter.

If an array of descriptions is used, the first table description is expected to hold primary entity (shown/filtered on the list). The rest of the tables can be used to look-up / link to information in other tables.

# name

String - Table name. If using Google Sheet, it is the sheet name shown on bottom. If using text files, it is the file name without extension.

# id

String - The column name that holds the unique id for each record in the table. Default is 'id'. If your table has a unique descriptor under a different column name, specify the column name here.

Using Google Sheets

If you are using a Google Sheet as data source, include the following script in your html page:

<script type="text/javascript" src="http://www.google.com/jsapi"></script>

API Parameters:

# gdocId

String - ID of your google document.

# query

String - Google Sheet Query, as documented on Google Sheet query language. Example: "select A,B,D".

Access control: Set share setting of your document to whoever you want to allow access (read) to your data on the webpage. You can make your spreadsheet public, or you can only share it with a specific group of people.

Developer info

File Loading

Each data table must be in a separate file. The file url for each table is generated using dirPath + table name + "." + fileType.

# dirPath

String - The directory path which stores the sheet files.

# fileType

String - File extension/type. Currently supported data types are 'csv' (comma separated file), 'tsv' (tab separated file), and 'json'.

Loading CSV / TSV files

The first row (line) must only include column names / headers.

Include the papaparse JavaScript library in your html page. For example,:

<script type="text/javascript" src="../js/papaparse.min.js" charset="utf-8"></script>

PapaParse is included under keshif/js/ directory.

Loading JSON files

If you want to use automated JSON loading, the source file should be an array at the top level, with a list of objects, each describing a record in your database. For example:

[ {name:'Joe', age:23} , {name:'Mary', age:25} , {name:'Nick', age:27} , (...) ]

Loading files from the cloud (Google Drive / Dropbox)

Loading file from the cloud just requires setting the dirPath parameter correctly to point to the URL that's made available by the cloud file service.

For Google Drive, ... (more info)...

For Dropbox, you need to copy your Public folder URL. See here for more info.

Custom Data Loading

To define your own function to load/parse your own data, use callback key.

# callback

Function - Callback function for data source. The first parameter is a pointer to browser object.

You can use ajax to load data files, and parse them to keshif tables.

Through this function, you can load custom JSON / XML files. For examples, please see existing demos.