Skip to content

Latest commit

 

History

History
83 lines (62 loc) · 2.61 KB

DESIGN.md

File metadata and controls

83 lines (62 loc) · 2.61 KB

Design Notes

Roadmap

General comment: let's do "README" (docs) driven development here.

  • [show] Local functionality for Frictionless datasets with CSV #528
  • [show] Uber Epic covering all functionality See below
    • [show] README only + data datasets (don’t have to be frictionless)
      • (?) Graphs direct in README with say visdown …
    • [show] SQL interface to the data (alasql or sql.js … https://github.com/agershun/alasql/wiki/Performance-Tests)
    • file/resource subpages ... (for datasets with lots of resources)
  • Docs 80% analysed #
  • Create portal components and library i.e. have a Table, Graph, Dataset component
    • publish to @datopian/portal
    • Examples
  • Catalog functionality 20% analysed

[uber][epic] Show functionality for single datasets

Features

  • Elegant
  • Description (README/Description)
  • Data preview and exploration (for tablular)
    • Basic: some sample data shown
    • Data exploration v1: filterable
    • Data Exploration v2: can do sql etc ...
  • Graphs / visualization
  • Validation: this row does not match schema in column X
  • Summarization e.g. this columns has this range of values, this average value, this number of nulls

Dataset structure support (in rough order of priority / like implementation)

  • Frictionless
  • Plain README (with frontmatter)
  • README (no frontmatter) and LICENSE file (?)

Data has roughly two dimensions that are relevant

  • Format

    • CSV
    • xlsx
    • JSON
    • ...
  • Size

    • Small: < 5mb (can just load inline ...)
    • Medium < 100mb
    • Large < 5Gb
    • xlarge > 5Gb
  • TODO: How does show/build work with remote files e.g. a resource ...

    path: abc.csv
    remote_storage_url: s3://.../.../.../
    

    Options:

    • We clone the data into path locally ...
      • Possible problem if data is big ...
    • Load data direct from remote_storage_url (as long as supports CORs)

Architecture

Portal.js is a React and NextJS based framework for building dataset/resources pages and catalogs. It consists of:

  • React components for data portal functionality e.g. data tables, graphs, dataset pages etc
  • Tooling to load data (based on Frictionless)
  • Template sites you can reuse using create-next-app
    • Single dataset micro-site
    • Github backed catalog
    • CKAN backed catalog
    • ...
  • Local development environment
  • Deployment integration with DataHub.io

In summary, technically PortalJS is: NextJS + data specific react components + data loading glue (mostly using frictionless-js).