Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] anvio-cluster-contigs fails with CONCOCT #2154

Closed
Ge0rges opened this issue Oct 26, 2023 · 17 comments
Closed

[BUG] anvio-cluster-contigs fails with CONCOCT #2154

Ge0rges opened this issue Oct 26, 2023 · 17 comments
Assignees

Comments

@Ge0rges
Copy link
Collaborator

Ge0rges commented Oct 26, 2023

Short description of the problem

This issue is meant to represent the following discord thread. I too encountered this error and decided to open this since nobody else has. It seems anvio is not interacting with CONCOCT properly.

anvi'o version

Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.12

Profile database .............................: 38
Contigs database .............................: 22
Pan database .................................: 17
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Using rocky linux and installed following the dev instructions on the website.

Detailed description of the issue

In my case I ran anvi-cluster-contigs -p SAMPLES-MERGED/PROFILE.db -c CONTIGS.db --driver concoct -T 80 --clusters 10 -C METABINS --just-do-it. I then obtained a config error from anvio complaining it's missing a file. I went to the log and see:

# CMD LINE: concoct --coverage_file /tmp/tmp83as40pm/contig_coverages.txt --composition_file /tmp/tmp83as40pm/sequence_contigs.fa --basename /tmp/tmp83as40pm --threads 80 --clusters 10
/usr/local/miniconda3/envs/anvio-dev/bin/concoct:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').run_script('concoct==1.1.0', 'concoct')
Up and running. Check /tmp/tmp83as40pm/log.txt for progress
Traceback (most recent call last):
  File "/usr/local/miniconda3/envs/anvio-dev/bin/concoct", line 4, in <module>
    __import__('pkg_resources').run_script('concoct==1.1.0', 'concoct')
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/pkg_resources/__init__.py", line 722, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1561, in run_script
    exec(code, namespace, namespace)
  File "/localdata/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/concoct-1.1.0-py3.10-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 90, in <module>
    results = main(args)
  File "/localdata/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/concoct-1.1.0-py3.10-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 37, in main
    transform_filter, pca = perform_pca(
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/concoct-1.1.0-py3.10-linux-x86_64.egg/concoct/transform.py", line 5, in perform_pca
    pca_object = PCA(n_components=nc, random_state=seed).fit(d)
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 435, in fit
    self._fit(X)
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 485, in _fit
    X = self._validate_data(
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/sklearn/base.py", line 548, in _validate_data
    self._check_feature_names(X, reset=reset)
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/sklearn/base.py", line 415, in _check_feature_names
    feature_names_in = _get_feature_names(X)
  File "/usr/local/miniconda3/envs/anvio-dev/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1903, in _get_feature_names
    raise TypeError(
TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.

Files / commands to reproduce the issue

anvi-cluster-contigs -p SAMPLES-MERGED/PROFILE.db -c CONTIGS.db --driver concoct -T 80 --clusters 10 -C METABINS --just-do-it

My files are too big to share unfortunately.

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Oct 26, 2023

I confirmed this occurs in v8 as well.

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Oct 27, 2023

Ok figured this out. Turns out it is a known issue with CONCOCT due to the fact that it is no longer compatible with the latest versions of sklearn.

If CONCOCT was installed with Conda this would not be an issue as the Conda recipe caps the sklearn version. However that is not the case if one follows the anvio instructions. @meren what's the best solution here? Either change the way CONCOCT is installed to use Conda, or change the Anvi'o instructions to use either A) a singularity container of CONCOCT (a pain) or B) cap the sklearn version of anvi'o (probably a pain later since CONCOCT doesn't seem to be maintained), and there's always C) nothing but print a warning.

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Oct 27, 2023

I confirmed this by doing pip install scikit-learn==1.1.0 in my anvi'o environment. After that, anvi-cluster-contigs completes successfully.

@meren
Copy link
Member

meren commented Oct 29, 2023

Thank you very much for looking into this, @Ge0rges. I'll take a look and see if I can come up with a workaround for this. The current version of sklearn is 1.2.2. In the worst case scenario we can require 1.1.1.

@meren meren self-assigned this Oct 29, 2023
@Sabrin2020
Copy link

I confirmed this by doing pip install scikit-learn==1.1.0 in my anvi'o environment. After that, anti-cluster-contigs completes successfully.

I did same and concoct worked fine but I wonder if running pip install scikit-learn==1.1.0 could break Anvio rules somewhere else ?

@meren
Copy link
Member

meren commented Nov 13, 2023

Since you were able to do the downgrade, it means the environment is stable. If this version breaks something, you will certainly notice that :) I think you're good.

@Sabrin2020
Copy link

Sabrin2020 commented Nov 13, 2023

I am getting a new error with ecophylo workflow which was working fine before ```

RuleException:
TypeError in file /user/suga8254/.conda/envs/anvio-8/lib/python3.10/site-packages/anvio/workflows/ecophylo/Snakefile, line 358:
StringMethods.rsplit() takes from 1 to 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given
 File "/user/suga8254/.conda/envs/anvio-8/lib/python3.10/site-packages/anvio/workflows/ecophylo/Snakefile", line 358, in __rule_process_hmm_hits
 File "/user/suga8254/.conda/envs/anvio-8/lib/python3.10/site-packages/pandas/core/strings/accessor.py", line 136, in wrapper
 File "/user/suga8254/.conda/envs/anvio-8/lib/python3.10/concurrent/futures/thread.py", line 58, in run
That is why I am wondering !!

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Nov 13, 2023

Hi @Sabrin2020 can you confirm that it worked just by changing the scikit-learn version? i.e. if you upgrade scikit it works again?

@Sabrin2020
Copy link

I just did that as test by going back to scikit-learn==1.2.2 and true it did not change and the ecophylo error still persist

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Nov 13, 2023

I would open a separate issue with your error with steps to reproduce.

@meren
Copy link
Member

meren commented Nov 13, 2023

This is weird. Under no circumstance a change in scikit version number should cause an error in the threads module of Python. Probably these two things are independent :( But as a test, you can reinstall the anvi'o environment from scratch to see if you can reproduce it, @Sabrin2020.

@Sabrin2020
Copy link

thanks @meren @Ge0rges I will reinstall the anvi'o environment from scratch

@Sabrin2020
Copy link

@meren reinstalled the anvi'o environment and no loger have this error StringMethods.rsplit() takes from 1 to 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given I did not installed concoct in same environment yet.

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Feb 21, 2024

@meren may be useful to add a warning about this somewhere near the CONCOCT installation instructions on the website perhaps.

@meren
Copy link
Member

meren commented Feb 21, 2024

I agree. Since we are no longer doing a lot of genome binning in the lab, those parts of the code and documentation is at the mercy of those who are using them outside :) If someone could formulate a warning text I could immediately put it somewhere in our installation instructions.

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Feb 21, 2024

Sure, meant to be somewhere near the CONCOCT install instructions:

Users should not that they may encounter an error when running CONCOCT of type TypeError. Please see here for more information about this. Here's the fix in a gist, at the end of your install and while in your conda environment do: pip install scikit-learn==1.1.0. Please let us know if this fix breaks any other part of Anvi'o. As of v8 we don't think it does.

meren added a commit to merenlab/anvio.org that referenced this issue Feb 23, 2024
@meren
Copy link
Member

meren commented Feb 23, 2024

Thank you @Ge0rges. I updated the installation instructions. Now there is a little note that looks like this:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants