Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembling big data #14

Open
lzaramela opened this issue Jul 24, 2019 · 5 comments
Open

Assembling big data #14

lzaramela opened this issue Jul 24, 2019 · 5 comments

Comments

@lzaramela
Copy link

Hey,
I have a big dataset (>600M paired-end reads) and I am trying to generate a protein catalog using Plass. I am using the version 2.c7e35 in a server with 900Gb ram. The processing is ending without completion due to exceeding the resources requested. I am wondering if it is possible to tweak the parameters to allocate less memory.
Any input will be greatly appreciated.
Thanks,
Livia

@milot-mirdita
Copy link
Member

Hi Livia,

Could you please post the log of the run? Plass should split up the work so it always fits into the available memory.

Best regards,
Milot

@lzaramela
Copy link
Author

Sure... here is the log file
PLASS_West.txt

I got the following message:
Execution terminated
Exit_status=271
resources_used.cput=46:09:04
resources_used.mem=531170280kb
resources_used.vmem=832592604kb
resources_used.walltime=42:54:33

@martin-steinegger
Copy link
Member

Thanks a lot! How much memory does your machine have? Normally Plass try to split the database if it does not fit in memory.

@lzaramela
Copy link
Author

CentOS server, I can use up to 900Gb ram.

@martin-steinegger
Copy link
Member

So it seems that the extractorfs step is hanging, which mostly requires IO. Is it possible that the tmp folder is on some slow network share?

One trick to reduce the amount of sequences extracted is to increase the minimum orf length with --min-length (default: 20).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants