Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added instructions for synapse download #179

Merged
merged 2 commits into from
May 21, 2024
Merged

added instructions for synapse download #179

merged 2 commits into from
May 21, 2024

Conversation

sgosline
Copy link
Member

This is a very small fix to the README that allows the buidl process to accomodate new users. @nkoussa can you maybe try out?

@jjacobson95
Copy link
Collaborator

jjacobson95 commented May 14, 2024

Will we have to update syn IDs for files or are these IDs shared across projects?

@sgosline
Copy link
Member Author

No, synapse ids are fixed and will never change as long as the file is still there. all users retain access (including you), but members of the team have restricted access to just the files/folders named. We can't test from within the project though, so until we find a tester just leave this open.

@nkoussa
Copy link

nkoussa commented May 15, 2024

Sure, I'll try it later this week!

@nkoussa
Copy link

nkoussa commented May 16, 2024

It got through most of it, and then failed here:

beataml drugs retrieved
running...mpnst drugs
['docker', 'run', '-v', '/Users/koussanc/Desktop/testingCoderdataForSara/coderdata/local/:/tmp/', '-e', 'SYNAPSE_AUTH_TOKEN=', '--platform=linux/amd64', 'mpnst', 'sh', 'build_drugs.sh', '/tmp/broad_sanger_drugs.tsv,/tmp/beataml_drugs.tsv']
b'\nTERMS OF USE NOTICE:\n When using Synapse, remember that the terms and conditions of use require that you:\n 1) Attribute data contributors when discussing these data or results from these data.\n 2) Not discriminate, identify, or recontact individuals or groups represented by the data.\n 3) Use and contribute only data de-identified to HIPAA standards.\n 4) Redistribute data only under these same terms of use.\n\n\nAttaching package: \xe2\x80\x98dplyr\xe2\x80\x99\n\nThe following objects are masked from \xe2\x80\x98package:data.table\xe2\x80\x99:\n\n between, first, last\n\nThe following objects are masked from \xe2\x80\x98package:stats\xe2\x80\x99:\n\n filter, lag\n\nThe following objects are masked from \xe2\x80\x98package:base\xe2\x80\x99:\n\n intersect, setdiff, setequal, union\n\nError: unexpected symbol in:\n"olddrugs<-do.call(rbind,lapply(unique(unlist(strsplit(olddrugfiles,split=','))),function(x) read.table(x,header=T,sep='\t',quote='',comment.char=''))\nolddrugs"\nExecution halted\n'
mpnst drugs file failed

@nkoussa
Copy link

nkoussa commented May 16, 2024

(At least I'm assuming that was most of it, it was hours)

@sgosline
Copy link
Member Author

Somehow the file didn't get checked in. I just added it to this branch.

@nkoussa
Copy link

nkoussa commented May 17, 2024

Failed here:

mpnst drugs retrieved
running...broad_sanger omics
['docker', 'run', '-v', '/Users/koussanc/Desktop/testingCoderdataForSara/coderdata/local/:/tmp/', '-e', 'SYNAPSE_AUTH_TOKEN=', '--platform=linux/amd64', 'broad_sanger_omics', 'sh', 'build_omics.sh', '/tmp/genes.csv', '/tmp/broad_sanger_samples.csv']
b'Killed\n\nAttaching package: \xe2\x80\x98dplyr\xe2\x80\x99\n\nThe following objects are masked from \xe2\x80\x98package:stats\xe2\x80\x99:\n\n filter, lag\n\nThe following objects are masked from \xe2\x80\x98package:base\xe2\x80\x99:\n\n intersect, setdiff, setequal, union\n\nRows: 269956 Columns: 4\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (3): gene_symbol, other_id, other_id_source\ndbl (1): entrez_id\n\n\xe2\x84\xb9 Use spec() to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set show_col_types = FALSE to quiet this message.\nRows: 42154 Columns: 8\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (7): common_name, cancer_type, other_names, species, other_id_source, ot...\ndbl (1): improve_sample_id\n\n\xe2\x84\xb9 Use spec() to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set show_col_types = FALSE to quiet this message.\nRows: 42154 Columns: 8\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (7): common_name, cancer_type, other_names, species, other_id_source, ot...\ndbl (1): improve_sample_id\n\n\xe2\x84\xb9 Use spec() to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set show_col_types = FALSE to quiet this message.\ntrying URL 'https://cog.sanger.ac.uk/cmp/download/mutations_all_20230202.zip\'\nContent type 'binary/octet-stream' length 112642865 bytes (107.4 MB)\n==================================================\ndownloaded 107.4 MB\n\nRows: 10050692 Columns: 13\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (9): gene_id, gene_symbol, model_id, protein_mutation, rna_mutation, cdn...\ndbl (1): vaf\nlgl (3): cancer_driver, cancer_predisposition_variant, coding\n\n\xe2\x84\xb9 Use spec() to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set show_col_types = FALSE to quiet this message.\nJoining with by = join_by(gene_symbol)\nJoining with by = join_by(other_id)\nJoining with by = join_by(effect)\nRows: 1408099 Columns: 56\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (25): Chrom, Ref, Alt, GT, VariantType, VariantInfo, DNAChange, ProteinC...\ndbl (12): Pos, AF, RefCount, AltCount, PS, TranscriptExon, GcContent, Popaf,...\nlgl (19): Str, DbsnpFilter, CCLEDeleterious, CosmicHotspot, LoF, Driver, Lik...\n\n\xe2\x84\xb9 Use spec() to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set show_col_types = FALSE to quiet this message.\nJoining with by = join_by(VariantInfo)\nJoining with by = join_by(other_id)\ntrying URL 'https://cog.sanger.ac.uk/cmp/download/rnaseq_all_20220624.zip\'\nContent type 'binary/octet-stream' length 940707972 bytes (897.1 MB)\n==================================================\ndownloaded 897.1 MB\n\nNew names:\n\xe2\x80\xa2 -> `...2`\nRows: 37606 Columns: 1433\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (1431): model_id, ...2, SIDM00001, SIDM00002, SIDM00003, SIDM00005, SIDM...\ndbl (2): SIDM00807, SIDM01076\n\n\xe2\x84\xb9 Use `spec()` to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set `show_col_types = FALSE` to quiet this message.\nJoining with `by = join_by(other_id)`\nJoining with `by = join_by(gene_symbol)`\nJoining with `by = join_by(other_id)`\nNew names:\n\xe2\x80\xa2 -> ...1\nRows: 1450 Columns: 19194\n\xe2\x94\x80\xe2\x94\x80 Column specification \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nDelimiter: ","\nchr (1): ...1\ndbl (19193): TSPAN6 (7105), TNMD (64102), DPM1 (8813), SCYL3 (57147), C1orf1...\n\n\xe2\x84\xb9 Use spec() to retrieve the full column specification for this data.\n\xe2\x84\xb9 Specify the column types or set show_col_types = FALSE to quiet this message.\nJoining with by = join_by(gene_symbol)\nJoining with by = join_by(other_id)\nKilled\n'
broad_sanger omics file failed

@sgosline
Copy link
Member Author

What type of machine are you using and how much memory are you allocating to Docker?

@nkoussa
Copy link

nkoussa commented May 17, 2024

Mac, and the Docker memory defaulted to 7.9GB, I didn't change it. Does this need more?

@sgosline
Copy link
Member Author

Yes, my docker has about 16GB allocated. We haven't thoroughly tested the memory constraints just yet.

@nkoussa
Copy link

nkoussa commented May 20, 2024

I upped the docker memory to 16GB and it still keeps failing at the broad_sanger omics file, about 20 minutes in. I've tried it three times with 16GB.

@sgosline
Copy link
Member Author

Ok, we are optimizing to build on large compute facilities - do you want me to provide an interim copy of the files? that might be easier than trying to build them yourself.

@sgosline sgosline merged commit 54a29e2 into main May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants