Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command Line usage example commands do not run properly #137

Open
yxuil opened this issue Feb 13, 2024 · 3 comments
Open

Command Line usage example commands do not run properly #137

yxuil opened this issue Feb 13, 2024 · 3 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@yxuil
Copy link

yxuil commented Feb 13, 2024

Describe the bug
docs/cli.rst has the load command examples. However, the load command example doesn't run correctly.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'docs/cli.rst'
  2. copy the load command in the Loading section
    seqrepo --root-directory $SEQREPO_ROOT/master load -n NCBI mirror/ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.*.gz
  3. paste to shell terminal.
  4. results in error:
    FileNotFoundError: [Errno 2] No such file or directory: 'mirror/ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.*.gz'

I have tried trim off the leading mirror/ or replace it with ftp:// but both didn't work.

Expected behavior
The CLI document is a few years old, and could use some update. There are additional CLI commands that are not covered. The short description with --help is hard to start with.

I am particularly interested in loading individual sequences in an existing instance. For example transcript NM_001387679.1 doesn't seem to be in the latest data pull 2023-09-28. It would be nice to know how to add it. Looks like the fetch-load is the possible command, but the cli.rst didn't mention this and a few other commands. Tried a few times but none is working:
seqrepo fetch-load -i 2023-09-28 -n NCBI NM_001387679.1

Additional context
I am using seqrepo conjunction with UTA or Cdot for validating variants. Found that in some occasions the transcript ID is annotated in UTA or Cdot but its sequence cannot be retrieved from seqrepo. I am hoping load those few missing transcripts with the command line tools, and looking for similar use cases.

@jsstevenson jsstevenson added the bug Something isn't working label Feb 18, 2024
@reece
Copy link
Member

reece commented Feb 19, 2024

Thanks for the report @yxuil. I'll investigate this week.

@jsstevenson
Copy link
Contributor

jsstevenson commented Mar 6, 2024

I think that step is just referencing a set of files (presumably from ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/) that are expected to be available under that subdirectory/glob. Could mention this or include a basic curl/wget command to acquire an example.

That said, the next command does raise an error for me -- note the path given in the exception, looks like it forcibly checks under a latest/ subdirectory within the --root-directory option

[ main ⚙ venv] ~/code/seqrepo % seqrepo --root-directory $SEQREPO_ROOT/master show-status
Traceback (most recent call last):
  File "/Users/jamesstevenson/code/seqrepo/venv/bin/seqrepo", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/jamesstevenson/code/seqrepo/src/biocommons/seqrepo/cli.py", line 733, in main
    opts.func(opts)
  File "/Users/jamesstevenson/code/seqrepo/src/biocommons/seqrepo/cli.py", line 580, in show_status
    sr = SeqRepo(seqrepo_dir)
         ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jamesstevenson/code/seqrepo/src/biocommons/seqrepo/seqrepo.py", line 120, in __init__
    raise OSError("Unable to open SeqRepo directory {}".format(self._root_dir))
OSError: Unable to open SeqRepo directory /usr/local/share/seqrepo/master/latest

@jsstevenson jsstevenson added the documentation Improvements or additions to documentation label Mar 25, 2024
@jsstevenson jsstevenson self-assigned this Mar 25, 2024
@jsstevenson
Copy link
Contributor

To sum up my previous comment, I think there are three things going on:

  1. The September SeqRepo release is missing some sequences. I think this has been resolved but we should double-check.
  2. This section of the docs is a little light on some key details. I'll do a rewrite in the next week or two.
  3. This section of the CLI probably over-specifies a path in at least one place. I'd like to get a PR up in a similar timeframe ^.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants