Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequana_coverage fails with GRCh37 #556

Open
vladsavelyev opened this issue Feb 3, 2019 · 3 comments
Open

sequana_coverage fails with GRCh37 #556

vladsavelyev opened this issue Feb 3, 2019 · 3 comments

Comments

@vladsavelyev
Copy link

GRCh37 chromosome names are pure integers (1, 2, 3, ... in oppose to chr1, chr2, chr3... in hg19/hg38), and at some point when a coverage or a fasta file is read into a dataframe, they get automatically parsed as ints, making this further chunk crash:

  File "/g/data3/gx8/extras/vlad/miniconda/envs/sequana/lib/python3.6/site-packages/sequana/scripts/coverage.py", line 421, in run_analysis
    directory += os.sep + chrom.chrom_name
TypeError: must be str, not numpy.int64

I guess in order to work with GRCh37, converting chrom_name into a str at some point might be needed.

Vlad

@vladsavelyev
Copy link
Author

Tried to fix in the commit above, however still getting a (probably unrelated) error:

sequana_coverage --input samtools_depth.bed --window-median 1001 -r /g/data3/gx8/extras/umccrise/genomes/GRCh37/GRCh37.fa -c 21

INFO    [sequana.bamtools]:  Reading samtools_depth.bed. This may take time depending on your input file
INFO    [sequana.bamtools]:  Scanning input file (chunk of 5000000 rows)
/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py:371: FutureWarning: read_table is deprecated, use read_csv instead.
  nrows=self.chunksize)
/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py:381: FutureWarning: read_table is deprecated, use read_csv instead.
  usecols=[0], chunksize=self.chunksize):
 [-----------------81%----------        ] 574 of 707 complete in 421.0 sec/g/data3/gx8/extras/vlad/miniconda/envs/sequana/bin/sequana_coverage:11: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_mem
ory=False.
  load_entry_point('sequana', 'console_scripts', 'sequana_coverage')()
 [-----------------87%-------------     ] 616 of 707 complete in 469.6 sec
/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py:633: FutureWarning: read_table is deprecated, use read_csv instead.
  chunksize=self.chunksize)
INFO    [sequana.bamtools]:  Computing GC content
WARNING [sequana.bamtools]:  There is only one chromosome. Selected automatically.
INFO    [sequana.bamtools]:  Computing some metrics

Traceback (most recent call last):
  File "/g/data3/gx8/extras/vlad/miniconda/envs/sequana/bin/sequana_coverage", line 11, in <module>
    load_entry_point('sequana', 'console_scripts', 'sequana_coverage')()
  File "/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/scripts/coverage.py", line 334, in main
    run_analysis(gc.chr_list[0], options, gc.feature_dict)
  File "/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/scripts/coverage.py", line 374, in run_analysis
    if chrom.DOC < 8:
  File "/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py", line 966, in DOC
    self._DOC = self.df['cov'].mean()
TypeError: 'NoneType' object is not subscriptable

@cokelaer
Copy link
Collaborator

cokelaer commented Feb 6, 2019

@vladsaveliev thanks for reporting this issue. Is this the latest version of sequana ? the master branch I suppose ? Or is it a specific version on pypi or conda ?

@vladsavelyev
Copy link
Author

vladsavelyev commented Feb 6, 2019

That's right, that's the master branch. More specifically, I initially had this issue with the installation from bioconda, and afterwards I cloned the master branch and the issue reproduced there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants