Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for stdin for at least FASTA/FASTQ #2017

Open
wasade opened this issue May 1, 2024 · 1 comment
Open

Support for stdin for at least FASTA/FASTQ #2017

wasade opened this issue May 1, 2024 · 1 comment
Assignees

Comments

@wasade
Copy link
Collaborator

wasade commented May 1, 2024

Many libraries support streaming FASTA/FASTQ data through stdin. It would be pleasant for that mode of operation to be supported as it is handy when writing small stream processors

$ cat test.py
import skbio
import sys
for r in skbio.read(sys.stdin, format='fasta'):
    print(r.metadata['id'])
$ cat q.fna 
>qx1
TTTTTTTTT
>qx2
TTTTTTTTNNNNNN
$ cat q.fna | python test.py 
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    for r in skbio.read(sys.stdin, format='fasta'):
  File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 1160, in read
    return io_registry.read(file, format=format, into=into, verify=verify,
  File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 506, in read
    return (x for x in itertools.chain([next(gen)], gen))
  File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 529, in _read_gen
    reader, kwargs = self._init_reader(file, fmt, into, verify, kwargs,
  File "/home/mcdonadt/miniconda3/envs/test/lib/python3.8/site-packages/skbio/io/registry.py", line 543, in _init_reader
    backup = file.tell()
io.UnsupportedOperation: underlying stream is not seekable
@RaeedA
Copy link

RaeedA commented May 7, 2024

Hey! This issue seems to stem from the fact that stdin isn't a seekable stream, which messes with the file sniffers. The problem specifically arises when saving the pointer position before sniffing, and restoring it afterwards.

I found that commenting them out caused the piping to work as expected, but with residual warnings.

(skbio-dev) azomss-air:scikit-bio raeedazom$ cat skbio/io/q.fna | python skbio/io/temp.py 
/Users/raeedazom/Documents/Coding/scikit-bio/skbio/io/registry.py:922: FormatIdentificationWarning: '_fasta_sniffer' has encountered a problem.
Please send the following to our issue tracker at
https://github.com/scikit-bio/scikit-bio/issues

Traceback (most recent call last):
  File "/Users/raeedazom/Documents/Coding/scikit-bio/skbio/io/registry.py", line 917, in wrapped_sniffer
    fh.seek(0)
io.UnsupportedOperation: underlying stream is not seekable

  warn(
/Users/raeedazom/Documents/Coding/scikit-bio/skbio/io/registry.py:536: FormatIdentificationWarning: <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'> does not look like a fasta file
  warn(
qx1
qx2

To fix this, I plan on adding a seekable vs nonseekable option which can bypass seeking and telling for nonseekable I/O.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants