`sequana_coverage` fails with GRCh37 #556

vladsavelyev · 2019-02-03T01:45:32Z

GRCh37 chromosome names are pure integers (1, 2, 3, ... in oppose to chr1, chr2, chr3... in hg19/hg38), and at some point when a coverage or a fasta file is read into a dataframe, they get automatically parsed as ints, making this further chunk crash:

  File "/g/data3/gx8/extras/vlad/miniconda/envs/sequana/lib/python3.6/site-packages/sequana/scripts/coverage.py", line 421, in run_analysis
    directory += os.sep + chrom.chrom_name
TypeError: must be str, not numpy.int64

I guess in order to work with GRCh37, converting chrom_name into a str at some point might be needed.

Vlad

The text was updated successfully, but these errors were encountered:

vladsavelyev · 2019-02-03T05:20:54Z

Tried to fix in the commit above, however still getting a (probably unrelated) error:

sequana_coverage --input samtools_depth.bed --window-median 1001 -r /g/data3/gx8/extras/umccrise/genomes/GRCh37/GRCh37.fa -c 21

INFO    [sequana.bamtools]:  Reading samtools_depth.bed. This may take time depending on your input file
INFO    [sequana.bamtools]:  Scanning input file (chunk of 5000000 rows)
/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py:371: FutureWarning: read_table is deprecated, use read_csv instead.
  nrows=self.chunksize)
/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py:381: FutureWarning: read_table is deprecated, use read_csv instead.
  usecols=[0], chunksize=self.chunksize):
 [-----------------81%----------        ] 574 of 707 complete in 421.0 sec/g/data3/gx8/extras/vlad/miniconda/envs/sequana/bin/sequana_coverage:11: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_mem
ory=False.
  load_entry_point('sequana', 'console_scripts', 'sequana_coverage')()
 [-----------------87%-------------     ] 616 of 707 complete in 469.6 sec
/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py:633: FutureWarning: read_table is deprecated, use read_csv instead.
  chunksize=self.chunksize)
INFO    [sequana.bamtools]:  Computing GC content
WARNING [sequana.bamtools]:  There is only one chromosome. Selected automatically.
INFO    [sequana.bamtools]:  Computing some metrics

Traceback (most recent call last):
  File "/g/data3/gx8/extras/vlad/miniconda/envs/sequana/bin/sequana_coverage", line 11, in <module>
    load_entry_point('sequana', 'console_scripts', 'sequana_coverage')()
  File "/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/scripts/coverage.py", line 334, in main
    run_analysis(gc.chr_list[0], options, gc.feature_dict)
  File "/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/scripts/coverage.py", line 374, in run_analysis
    if chrom.DOC < 8:
  File "/g/data3/gx8/extras/vlad/tmp/umccrise_2019_conventionalbcbio/sequana_coverage/sequana/sequana/bedtools.py", line 966, in DOC
    self._DOC = self.df['cov'].mean()
TypeError: 'NoneType' object is not subscriptable

cokelaer · 2019-02-06T09:40:36Z

@vladsaveliev thanks for reporting this issue. Is this the latest version of sequana ? the master branch I suppose ? Or is it a specific version on pypi or conda ?

vladsavelyev · 2019-02-06T10:31:58Z

That's right, that's the master branch. More specifically, I initially had this issue with the installation from bioconda, and afterwards I cloned the master branch and the issue reproduced there as well.

vladsavelyev added a commit to vladsavelyev/sequana that referenced this issue Feb 3, 2019

Coverage: attempt to fix reading GRCh37 chromosome names (sequana#556)

f6ee7fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sequana_coverage` fails with GRCh37 #556

`sequana_coverage` fails with GRCh37 #556

vladsavelyev commented Feb 3, 2019

vladsavelyev commented Feb 3, 2019

cokelaer commented Feb 6, 2019

vladsavelyev commented Feb 6, 2019 •

edited

sequana_coverage fails with GRCh37 #556

sequana_coverage fails with GRCh37 #556

Comments

vladsavelyev commented Feb 3, 2019

vladsavelyev commented Feb 3, 2019

cokelaer commented Feb 6, 2019

vladsavelyev commented Feb 6, 2019 • edited

`sequana_coverage` fails with GRCh37 #556

`sequana_coverage` fails with GRCh37 #556

vladsavelyev commented Feb 6, 2019 •

edited