Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory requirements in Summarise #25

Open
JohnMCMa opened this issue Sep 10, 2018 · 3 comments
Open

Memory requirements in Summarise #25

JohnMCMa opened this issue Sep 10, 2018 · 3 comments

Comments

@JohnMCMa
Copy link

JohnMCMa commented Sep 10, 2018

Hi,

I attempted doing the suggested workflow on 10X Genomics data described on #21 . The program works well up to the assembly stage, but we have problems during the summarise stage.

I ran summarise with relatively common arguments ( -p 8 --graph_format pdf --infer_lineage), but:

  1. changeodb.tab, IMGT_gapped.tab, reconstructed_lengths_BCR[H|K|L].pdf, reconstructed_lengths_BCR[H|K|L].txt, full_length_seqs.pdf, changeo_input_[H|K|L]_clone-pass.tab and isotype_distribution.pdf were successfully generated nearly immediately. However, clonotype_network_[with|without]_identifiers.dot took at least 8 hours to generate. Halting the script during this period shows the time was spent running make_cell_network_from_dna, Especially on line 1094-1095 of bracer_func.py.
  2. I'm under the impression that this step spent an abnormal amount of memory. When I assigned a Grid Engine job with 8 cores and 176GB of memory, the job quitted due to memory over-usage.

The data set includes 4,275 barcodes, that I don't believe to be particularly many. Are there any thing I can do with it?

@alipsky
Copy link

alipsky commented Dec 22, 2018

Hello bracer team... I am also doing the same procedure described in the prior post. Coerced the 10x data and ran assemble on 3220 barcodes without issues. Now running bracer in a google cloud instance in a docker container, (8 cores 30 gigs of ram). It has been running for 2 days, and during this time I reliably loose SSH connection to my cloud instance and can't regain it. Upon restarting the VM, I get the contents below in the filtered_BCR_summary directory....

suspect the same issue as in this post. Can you give some guidance on how to overcome this, what kind of memory I would need to request (in the cloud) to get to the final output?

Thanks so much!
Andrew

total 11M
-rw-r--r-- 1 root root 0 Dec 20 13:05 BCR_summary.txt
-rw-r--r-- 1 root root 532K Dec 20 13:06 IMGT_gapped_db.tab
-rw-r--r-- 1 root root 878K Dec 20 13:06 changeo_input_H.tab
-rw-r--r-- 1 root root 884K Dec 20 13:06 changeo_input_H_clone-pass.tab
-rw-r--r-- 1 root root 23K Dec 20 13:06 changeo_input_K.tab
-rw-r--r-- 1 root root 24K Dec 20 13:06 changeo_input_K_clone-pass.tab
-rw-r--r-- 1 root root 1.4M Dec 20 13:06 changeo_input_L.tab
-rw-r--r-- 1 root root 1.4M Dec 20 13:06 changeo_input_L_clone-pass.tab
-rw-r--r-- 1 root root 5.4M Dec 20 13:06 changeodb.tab
-rw-r--r-- 1 root root 0 Dec 20 14:31 clonotype_network_with_identifiers.dot
-rw-r--r-- 1 root root 13K Dec 20 13:06 full_length_seqs.pdf
-rw-r--r-- 1 root root 12K Dec 20 13:06 isotype_distribution.pdf
drwxr-xr-x 2 root root 4.0K Dec 20 13:05 lineage_trees
-rw-r--r-- 1 root root 16K Dec 20 13:06 reconstructed_lengths_BCR_H.pdf
-rw-r--r-- 1 root root 6.1K Dec 20 13:06 reconstructed_lengths_BCR_H.txt
-rw-r--r-- 1 root root 16K Dec 20 13:06 reconstructed_lengths_BCR_K.pdf
-rw-r--r-- 1 root root 232 Dec 20 13:06 reconstructed_lengths_BCR_K.txt
-rw-r--r-- 1 root root 15K Dec 20 13:06 reconstructed_lengths_BCR_L.pdf
-rw-r--r-- 1 root root 12K Dec 20 13:06 reconstructed_lengths_BCR_L.txt

@mstubb
Copy link
Member

mstubb commented Dec 22, 2018

Hi,

Can you try running summarise with the --no_networks option? It’s usually the network graph layout that becomes very slow and resource intensive with large numbers of cells.

Mike

@idalind
Copy link
Collaborator

idalind commented Dec 31, 2018

I would suggest the same as Mike. Do you expect your cells to be highly clonal? This could explain the huge memory use. Unfortunately we have not tested BraCeR on datasets as large as 10x ones.

Best,
Ida

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants