Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprint 9 Task List #421

Closed
26 of 35 tasks
akotlar opened this issue Mar 8, 2024 · 3 comments
Closed
26 of 35 tasks

Sprint 9 Task List #421

akotlar opened this issue Mar 8, 2024 · 3 comments
Labels
.task list A checklist of smaller tasks
Milestone

Comments

@akotlar
Copy link
Collaborator

akotlar commented Mar 8, 2024

Webapp

  • Fix S3 upload - March 13th for PR @akotlar
  • Fix search cards to show all data - March 13th for PR @akotlar
  • Hide settings wheel in queued file list (it is not currently enabled) - March 11th @akotlar
  • Autocomplete field names - March 15th @akotlar
  • Fix search phase exception with long query: @akotlar noticed that an aggregations (Heterozygotes per variant was cancelled after it ran for a certain amount of time). If cannot be solved, remove the offending aggregation scripts. https://github.com/bystrogenomics/bystro-web/issues/388 - @akotlar - March 28th
  • Finish making AMIs for bystro-web, bystro, and opensearch servers - hard deadline is March 29th.
  • (stretch) Add ability to rename genetic submissions in front end, before uploading them, by clicking on the name -@akotlar
  • (stretch) Add ability to stitch together datasets.
  • (stretch) Add Jupyter notebook launcher
  • (stretch) support (refSeq:brca1), which would expand to (refseq.\*:brca1)
  • (stretch) Update numerical facets to allow selecting on ranges using slider

Annotation

  • Test hg19_v8 database by comparing results, row by row, on trio_trim.vcf.gz between hg19_v8 and dave's instance b10 database - March 15th @akotlar
  • Create hg38 database - March 11th @akotlar
  • Test hg38 database by comparing results, row by row, on trio_trim.vcf.gz between hg19_v8 and dave's instance b10 database - March 18th @akotlar

Ancestry

  • Retrain ancestry model (hg38) with 76k gnomad loadings - @cristinaetrv - March 13th

  • Liftover array set - @cristinaetrv - March 13th

  • Liftover gnomad loadings - @cristinaetrv - March 14th

  • Add assembly to AncestryData, update bystro-web to submit assembly, and use assembly when choosing model (Send assembly version with ancestry job requests #419) - @akotlar - March 13th

  • Create ancestry docker container - @akotlar - March 14th

  • Add support for choosing best covariate set for ancestry - @akotlar - March 15th

  • Gives Thomas new ancestry code to test on healthy aging study - March 20th @cristinaetrv

  • Add Ancestry table version that just has IDs and top hit superpops that can be expanded to larger table @akotlar

  • Before expanding, top hit for superpops is seen, after expanding format remains as before

  • Possible format: Sample ID | Top hit Superpop | Prob(Top hit superpop) --> Expand to see: Number Variants retained| Prob of other Superpops (5 columns) | Prob of Population 1,2,3 ...

  • For populations: Instead of adding all 26 populations as something you have to open for each one, add the columns as 'Prob(CEU), Prob(CDX)' and so on for each individual

  • Switch missingness to 'variants retained'

Proteomics Data Handling / API

  • Finish download of proteomics data - March 12th @dlin30
  • Fix frontend upload of proteomic data - March 13th @dlin30
  • Add support for somascan upload in bystro webapp and api - March 14th @akotlar

Proteomics Statistics

  • Jupyter notebook demonstrating adjusting for batch effects on Adverserial PPCA on simulation data + a real dataset (@akotlar) - March 14th @austinTalbot7241993

Infrastructure

  • Make sure EBS scratch disk is very well provisioned, or use instance with 4TB SSD
  • Update bystro webapp documentation on how to use bystro, write API documentation, write library documentation - @akotlar - March 28th

PRS

  • Add AD GWAS summary statistics suggested by Thomas for C+T PRS - March 15th @cristinaetrv
  • Add readme for AD GWAS sum stats - March 18th @cristinaetrv -> Move to sprint 10
  • Add batch processing for PRS C+T workflow - March 25th -> Move to sprint 10
  • Finish PRS-CS standard way without Langevin Dynamics - @austinTalbot7241993 - March 28th.
  • Add support for covariates into PRS - @cristinaetrv - March 25th -> Backlogged for now, move to sprint 11/12

POE

  • Mike Epstein was positive on Austin's POE method. Address Mike's simulation suggestions - March 28th
@akotlar akotlar added this to the Sprint 9 milestone Mar 8, 2024
@akotlar akotlar changed the title Sprint 9 IBDGC Task List Sprint 9 Task List Mar 8, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 11, 2024
akotlar added a commit that referenced this issue Mar 12, 2024
…le (#427)

* Dynamically load model based on the assembly passed in the job request
* Cache up to 2 models to improve startup time

Stacked on
159bdcf

The commit for this PR:
1e64548

Also addresses #422
akotlar added a commit that referenced this issue Mar 12, 2024
* Don't lowercase HGVS clinvarVCF.CLNHGVS, because no one lowercases
HGVS
akotlar added a commit to akotlar/bystro that referenced this issue Mar 13, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 13, 2024
cristinaetrv added a commit that referenced this issue Mar 13, 2024
…nd hg38 data (#431)

* Updates hg38.mapping.yamland hg19.mapping.yaml to support new our new
hg19 and hg38 databases
* Updates hg38.clean.yml and hg19.clean.yml to match the bystro
annotator definitions of our new databases

hg38.mapping.yml is now a separate definition, rather than a symlink to
hg19.mapping.yml. It is identical to hg19.mapping.yml besides the gnomad
sections, which have to be different since gnomad v4 was used in the
hg38 database.
akotlar added a commit to akotlar/bystro that referenced this issue Mar 14, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 14, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 14, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 15, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 15, 2024
akotlar added a commit to akotlar/bystro that referenced this issue Mar 18, 2024
akotlar added a commit that referenced this issue Mar 18, 2024
* Adds somascan adat and annotation support, see
https://github.com/SomaLogic/Canopy for documentation on adat and
annotations file formats.
akotlar added a commit that referenced this issue Mar 18, 2024
…alse'/'true' labels (#438)

* Make discordant an actual boolean field, by outputting 'false'/'true'
labels

This makes it easier to search the field, as well as to import it as a
boolean in Pandas/R.
akotlar added a commit to akotlar/bystro that referenced this issue Mar 19, 2024
akotlar added a commit that referenced this issue Mar 20, 2024
* Add the actual somascan code, missed in
#433
cristinaetrv pushed a commit to cristinaetrv/bystro that referenced this issue Mar 20, 2024
@cristinaetrv cristinaetrv added the .task list A checklist of smaller tasks label Mar 21, 2024
akotlar added a commit that referenced this issue Mar 25, 2024
…ncestry Memory Usage (#449)

* Adds docker file for Bystro's python library
* Creates ancestry api and cli code for calculating ancestry scores
* Removes unneeded dependencies from Cargo.toml, to speed up builds
* Improves Makefile by introducing the ability to make production builds
and install from wheel.
* Reduce ancestry memory usage by reading in sample chunks
* Cache ancestry scores to local disk to reduce S3 fetching

To test what is here:

```
docker pull akotlar/bystro-api
docker run -v /path/to/local/data:/data  akotlar/bystro-api ancestry score --in /data/trio.trim.vep.vcf.gz --assembly hg19 
```


[trio.trim.vep.vcf.gz](https://github.com/bystrogenomics/bystro/files/14730266/trio.trim.vep.vcf.gz)

The api function is a port of ancestry/listener.py handler_fn.
@akotlar
Copy link
Collaborator Author

akotlar commented Mar 29, 2024

Updates on //2024-03-29

Alex:

  • IBDGC tasks are done
  • Somascan support in, but not yet threaded through to bystro-api (partially blocked by proteomics submission PR)
  • In process of bug fixing. Need to switch to polling strategy for currently viewed job; we're seeing evidence that our existing socketio implementation is dropping updates)

Dennis: (has been sick)

  • Streaming proteomics works, will PR
  • Proteomics submission in progress

Austin:

  • Spectral alignment - imputation is in
  • Harmonizing datasets - in progress - goal is to get an outer join (TMT + somascan), harmonized, so that the data is jointly analyzable
  • PRS-CS - on back burner until proteomics is in
  • SSPCA - has experimental results on 2 datasets (neuroscience, we beat L1 regularization; so we can argue our merits from purely predictive results, not just generative results as now)
  • POE - we're running into: size of parent of origin effects are so tiny relative to variance of our features that gaussian mixture models end up being a poor fit; our estimator will converge almost surely, but that is not so practically relevant, for our sample sizes. So can we do better than that, especially in taming bias which is inflating Type II / false positive error rate.

@akotlar
Copy link
Collaborator Author

akotlar commented Apr 1, 2024

S3 Uploads now work, even for very large datasets (e.g. 100GB+ uploads).
bystro webapp documentation is updated, but I have not had a chance to add detailed library documentation

@akotlar
Copy link
Collaborator Author

akotlar commented Apr 1, 2024

Semi-automated AMIs have been made and deployed to IBDGC. They need 1 more pass of refinement in order to be fully capable of being taken down / up at will in autoscaling fashion, sprint 10 task

akotlar added a commit to akotlar/bystro that referenced this issue May 21, 2024
akotlar added a commit to akotlar/bystro that referenced this issue May 23, 2024
akotlar added a commit to akotlar/bystro that referenced this issue May 23, 2024
akotlar added a commit to akotlar/bystro that referenced this issue May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.task list A checklist of smaller tasks
Projects
None yet
Development

No branches or pull requests

2 participants