Features/requirements for Be The Match collaboration #9

heuermh · 2017-05-08T21:55:18Z

fnothaft · 2017-05-09T20:39:59Z

This SGTM. WRT merge_genotypes, do they want to "square off" with no-calls?

heuermh · 2017-05-09T20:46:28Z

want to "square off" with no-calls?

Not sure, will ask when I get to that step.

Notebook with queries ...

In a meeting this afternoon, they've decided to use Apache Zeppelin on Amazon EMR for this use case.

With some clicking around we got ADAM installed on Zeppelin using Maven Central coordinates. Need to do a bit more digging to figure out where to set the Kryo Spark configuration parameters, and create a separate EMR step for Conductor (we used s3-dist-cp).

heuermh · 2017-09-01T14:27:25Z

As an update:

All the transformations to ADAM Avro+Parquet have been run on EMR clusters, downloading from s3 to HDFS using conductor, and uploading from HDFS to s3 using s3-dist-cp, using bash scripts at https://github.com/heuermh/hook.

Notebooks have been implemented in Zeppelin and RStudio on EMR.

The conversation about merging samples into larger data sets has not happened yet.

fnothaft pushed a commit to fnothaft/workflows that referenced this issue Sep 7, 2017

Add vcf manipulation tools (resolves bigdatagenomics#9)

566b2bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/requirements for Be The Match collaboration #9

Features/requirements for Be The Match collaboration #9

heuermh commented May 8, 2017 •

edited

fnothaft commented May 9, 2017

heuermh commented May 9, 2017

heuermh commented Sep 1, 2017

Features/requirements for Be The Match collaboration #9

Features/requirements for Be The Match collaboration #9

Comments

heuermh commented May 8, 2017 • edited

fnothaft commented May 9, 2017

heuermh commented May 9, 2017

heuermh commented Sep 1, 2017

heuermh commented May 8, 2017 •

edited