Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try DuckDB #1828

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Try DuckDB #1828

wants to merge 11 commits into from

Conversation

keller-mark
Copy link
Member

@keller-mark keller-mark commented Mar 22, 2024

Towards #1824

Background

Change List

TODO: re-implement minimally on fresh branch, with minimal changes needed to support useSql hook (e.g. in plugin views)

TODO: For proof of concept

  • Create table to store obsEmbedding X/Y coordinates plus obs IDs in database
  • Insert obsEmbedding data into database the first time it is requested from data-hooks.
  • Query DB for obsEmbedding table any time it is requested from data-hooks.
  • Pass Arrow table directly into DeckGL without modification - see https://github.com/geoarrow/deck.gl-layers
  • Benchmark initial load time and network request differences?
  • Use lazy database instantiation with https://www.npmjs.com/package/lazy-duckdb-react
  • Reduce bundle size below 50 MB to ensure compatibility with vitessce-python and unpkg
    • Use Observable approach of hosting DuckDB WASM files on CDN/object store that we have control over
  • For tables which contain Arrow dictionaries, insert only the "codes" part and keep the "categories" part in JS

Notes

  • Had to manually change MIME type of .wasm files pushed to S3 during ./scripts/push-demos.sh from binary/octet-stream to application/wasm. TODO: do this via AWS CLI within push-demos script

Checklist

  • Ensure PR works with all demos on the dev.vitessce.io homepage
  • Open (draft) PR's into vitessce-python and vitessce-r if this is a release PR
  • Documentation added or updated

Copy link
Contributor

github-actions bot commented Mar 22, 2024

Size Change: +122 MB (+936%) 🆘

Total Size: 135 MB

Filename Size Change
./packages/main/prod/dist/index-********.js 129 MB +122 MB (+1685%) 🆘
ℹ️ View Unchanged
Filename Size
./packages/main/prod/dist/blosc-********.js 612 kB
./packages/main/prod/dist/browser-********.js 16.2 kB
./packages/main/prod/dist/chunk-INHXZS53-********.js 558 B
./packages/main/prod/dist/deflate-********.js 243 B
./packages/main/prod/dist/gzip-********.js 693 B
./packages/main/prod/dist/hglib-********.js 4.33 MB
./packages/main/prod/dist/index.min.js 902 B
./packages/main/prod/dist/jpeg-********.js 15.3 kB
./packages/main/prod/dist/lerc-********.js 47.2 kB
./packages/main/prod/dist/lz4-********.js 43.9 kB
./packages/main/prod/dist/lzw-********.js 2.1 kB
./packages/main/prod/dist/packbits-********.js 576 B
./packages/main/prod/dist/pako.esm-********.js 68.6 kB
./packages/main/prod/dist/raw-********.js 168 B
./packages/main/prod/dist/webimage-********.js 836 B
./packages/main/prod/dist/zlib-********.js 695 B
./packages/main/prod/dist/zstd-********.js 643 kB

compressed-size-action

@keller-mark
Copy link
Member Author

Notes on apache arrow:

How to set (inefficiently) a single string value to repeat for every row

export function repeatString(val, numRows) {
  return arrow.vectorFromArray(
    Array.from({ length: numRows }).fill(val),
    new arrow.Dictionary(new arrow.Utf8, new arrow.Int32),
  )
}

let arrowTable = // something
arrowTable = arrowTable.assign(arrow.makeTable({
  myColumnName: repeatString(valueToRepeat, arrowTable.numRows)
}))

@keller-mark keller-mark marked this pull request as draft March 30, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant