Unknown error while building similarity matrix #102

Adifo · 2017-07-25T09:34:00Z

Hi,

I am trying to infer orthogroups from pre calculate blast on a large amount of species (185).
Orthofinder is starting fine but stop without reporting any specific issue after the initial processing of all species ( or part of it).

I have used the following command with version 1.1.4 and 1.1.8 on workstation with increasing RAM (128Go, 256Go and 500Go) but the process stopped using less than 160Go.
orthofinder -b wd/ -I 1.5 -a 30 -og

Is there a way to make it work or will it be impossible for that many species?

Here is the traceback from my last try
Process Process-6: Traceback (most recent call last): File "multiprocessing/process.py", line 258, in _bootstrap File "multiprocessing/process.py", line 114, in run File "orthofinder/orthofinder.py", line 483, in Worker_ProcessBlastHits File "orthofinder/orthofinder.py", line 475, in ProcessBlastHits File "orthofinder/scripts/matrices.py", line 37, in DumpMatrixArray File "orthofinder/scripts/matrices.py", line 33, in DumpMatrix SystemError: error return without exception set

and the end of the output

2017-07-25 00:57:16 : Initial processing of species 109 complete 2017-07-25 01:27:33 : Initial processing of species 110 complete 2017-07-25 01:35:58 : Initial processing of species 111 complete 2017-07-25 01:47:12 : Initial processing of species 112 complete ERROR: An error occurred, please review previous error messages for more information.

thank you in advance for any insight regarding my issue.

The text was updated successfully, but these errors were encountered:

davidemms · 2017-07-25T10:54:57Z

Hi

It appears to be this problem: numpy/numpy#2396

Basically OrthoFinder analyses the BLAST hits between a pair of species and then it writes the results to disk instead of holding them in RAM. It uses cPickle to write these python objects to disk and the problem appears to come from here. I'm looking into what this means for OrthoFinder and if there is a way around.

All the best
David

davidemms · 2017-07-26T09:54:16Z

Hi

It looks like the problem has been fixed in later versions of python 2.7. For example it works on a server I have access to with version 2.7.12 but is a problem with version 2.7.3 that I have on my desktop. Upgrading to a later version of python 2 may be tricky though.

I think I have another solution but it will depend on what version of glibc you have installed. Would you be able to run
ldd --version
and tell me what version of glibc you have?

Thanks
David

Adifo · 2017-07-26T10:23:03Z

Hi David,

I've checked on the two workstation. The first is running python 2.7.9 and glibc is version 2.19 and the second one python 2.7.12 and glibc 2.23.

I've looked at the pic files obtained on the second one and noticed that all B pic files were created correctly but was missing an hundred BH ones. They were all related to the same species first species. I've tried removing the said species and it worked.

I am now trying to understand the reason behind this specific species failing the all process. It appears that it has an humongous amount of self hit (half billion of hit) even after dereplication and knowing that it is not the largest one.

davidemms · 2017-07-26T11:50:38Z

Ok, if you have version 2.7.12 then that should resolve the problem that I thought was causing your original error message (the cPickle problem with writing large files should be fixed in that version). But you still get an error if you run orthofinder on this machine with the first species included? What error message do you get? I'm wondering if the error with the missing BH files is different and has a separate cause.

Half a billion hits sounds a lot! I wonder why you get so many. That's would be like having over 20,000 sequences and every sequence hitting all other sequences!

All the best
David

Adifo · 2017-07-28T09:12:59Z

Hi,

Yes, as long as the species is there, I get the kind of error message from the first post. Orthofinder is just telling me that an error occurred without other information except for the Traceback. It was exactly the same in all my configuration.

About the specie, it has around 90000 sequences and I am already removing sequences with more than 98% of similarity. It is not the organism with the most sequences as I have two with around 130000. The amount of self blast hit for those does not excess 5 million.

What I don't understand also is the fact that the BH files were create fine for the reverse blast of the troubling specie.

davidemms · 2017-07-28T16:09:58Z

Hi

Would you be able to post all the output you get when running it with python 2.7.12 please? I will need to look into it further as the initial issue I identified I don't think occurs with python 2.7.12 so there must be something else.

Thanks
David

davidemms closed this as completed May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unknown error while building similarity matrix #102

Unknown error while building similarity matrix #102

Adifo commented Jul 25, 2017

davidemms commented Jul 25, 2017

davidemms commented Jul 26, 2017

Adifo commented Jul 26, 2017

davidemms commented Jul 26, 2017

Adifo commented Jul 28, 2017

davidemms commented Jul 28, 2017

Unknown error while building similarity matrix #102

Unknown error while building similarity matrix #102

Comments

Adifo commented Jul 25, 2017

davidemms commented Jul 25, 2017

davidemms commented Jul 26, 2017

Adifo commented Jul 26, 2017

davidemms commented Jul 26, 2017

Adifo commented Jul 28, 2017

davidemms commented Jul 28, 2017