You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many libraries support streaming FASTA/FASTQ data through stdin. It would be pleasant for that mode of operation to be supported as it is handy when writing small stream processors
$ cat test.py
import skbio
import sys
forrin skbio.read(sys.stdin, format='fasta'):
print(r.metadata['id'])
$ cat q.fna
>qx1
TTTTTTTTT
>qx2
TTTTTTTTNNNNNN
$ cat q.fna | python test.py
Traceback (most recent call last):
File "test.py", line 3, in<module>forrin skbio.read(sys.stdin, format='fasta'):
File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 1160, inreadreturn io_registry.read(file, format=format, into=into, verify=verify,
File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 506, inreadreturn (x forxin itertools.chain([next(gen)], gen))
File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 529, in _read_gen
reader, kwargs = self._init_reader(file, fmt, into, verify, kwargs,
File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 543, in _init_reader
backup = file.tell()
io.UnsupportedOperation: underlying stream is not seekable
The text was updated successfully, but these errors were encountered:
Hey! This issue seems to stem from the fact that stdin isn't a seekable stream, which messes with the file sniffers. The problem specifically arises when saving the pointer position before sniffing, and restoring it afterwards.
I found that commenting them out caused the piping to work as expected, but with residual warnings.
(skbio-dev) azomss-air:scikit-bio raeedazom$ cat skbio/io/q.fna | python skbio/io/temp.py
/Users/raeedazom/Documents/Coding/scikit-bio/skbio/io/registry.py:922: FormatIdentificationWarning: '_fasta_sniffer' has encountered a problem.
Please send the following to our issue tracker at
https://github.com/scikit-bio/scikit-bio/issues
Traceback (most recent call last):
File "/Users/raeedazom/Documents/Coding/scikit-bio/skbio/io/registry.py", line 917, in wrapped_sniffer
fh.seek(0)
io.UnsupportedOperation: underlying stream is not seekable
warn(
/Users/raeedazom/Documents/Coding/scikit-bio/skbio/io/registry.py:536: FormatIdentificationWarning: <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'> does not look like a fasta file
warn(
qx1
qx2
To fix this, I plan on adding a seekable vs nonseekable option which can bypass seeking and telling for nonseekable I/O.
Many libraries support streaming FASTA/FASTQ data through stdin. It would be pleasant for that mode of operation to be supported as it is handy when writing small stream processors
The text was updated successfully, but these errors were encountered: