Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ISSUEs concerning about constructing REFERENCE on vdj sequencing and full length immune receptor sequencing #269

Open
Gethell opened this issue Nov 1, 2023 · 6 comments

Comments

@Gethell
Copy link

Gethell commented Nov 1, 2023

Description
When I did as the tutorial on official tweets, I met some issues as running the following codes:
celescope vdj mkref human TR
but encounted issue like :

Traceback (most recent call last):
File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in
sys.exit(main())
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main
args.func(args)
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref
runner()
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 35, in call
self.combine_seq()
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper
result = func(*args, **kwargs)
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 59, in combine_seq
assert len(imgt_files) == 7
AssertionError

And I run the code:
celescope vdj mkref human IG
I met another issue like:

Building a new DB, current time: 11/01/2023 15:27:28 New DB name: /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa New DB title: IGV.fa Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa Keep MBits: T Maximum file size: 3000000000B BLAST options error: File IGV.fa is empty Traceback (most recent call last): File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in <module> sys.exit(main()) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main args.func(args) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref runner() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 36, in __call__ self.build_index() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, **kwargs) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 76, in build_index subprocess.check_call(f"makeblastdb -parse_seqids -dbtype nucl -in {out_file_name}.fa", shell=True) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'makeblastdb -parse_seqids -dbtype nucl -in IGV.fa' returned non-zero exit status 1.

To Troubleshot
I think the isse NO.1 might cause by the wrong format but have no idea about another issue, so I did some troubleshots:

  1. Format of one of the IMGT reference files showed as following:
    head -n 20 TRAJ.fasta
    trouble1

head -n 20 IGHD.fasta
trouble3

  1. Check whether the output directory is null:
    ls -al ../IG/human_IG
    trouble2
    but the "TR" directory is null.
  2. And I finally check version of celescope
    Version
    celescope -v
    2.0.3

I sincerely expect some responses on resolution about these issues, or show me some normative format of IMGT data for constructing index file. Moreover, I hope that some detailed instance on how to construct index file about vdj or full length immune receptors sequencing.
Last question I would like to raise:
Whether the index files based on IMGT data between single cell vdj and full length immune receptor sequencing are coincident?

@Chenjunjie1996
Copy link
Collaborator

  1. Make sure celescope version>=1.15.1 and refer to https://github.com/singleron-RD/CeleScope/blob/master/doc/assay/multi_vdj.md.
  2. IMGT and full length vdj reference are different.

@Gethell
Copy link
Author

Gethell commented Nov 3, 2023

Thanks for your response! I tried as tutorial you provided, but unfortunately I encunted another issue while running codes like:
wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta
and
wget http://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Mus_musculus/IG/IG{H,K,L}{V,J}.fasta
The issue descriped as:

--2023-11-03 10:47:00-- https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR%7BA,B%7D%7BV,J%7D.fasta
正在解析主机 www.imgt.org (www.imgt.org)... 195.83.84.12
正在连接 www.imgt.org (www.imgt.org)|195.83.84.12|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 404 Not Found
2023-11-03 10:47:01 错误 404:Not Found。

trouble4

@Chenjunjie1996
Copy link
Collaborator

Remove character \
Correct link:

wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta

@Gethell
Copy link
Author

Gethell commented Nov 29, 2023

Remove character \ Correct link:

wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta

Thanks for your response. And finally I succeed following your advise!

@Gethell
Copy link
Author

Gethell commented Nov 29, 2023

Description When I did as the tutorial on official tweets, I met some issues as running the following codes: celescope vdj mkref human TR but encounted issue like :

Traceback (most recent call last):
File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in
sys.exit(main())
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main
args.func(args)
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref
runner()
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 35, in call
self.combine_seq()
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper
result = func(*args, **kwargs)
File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 59, in combine_seq
assert len(imgt_files) == 7
AssertionError

And I run the code: celescope vdj mkref human IG I met another issue like:

Building a new DB, current time: 11/01/2023 15:27:28 New DB name: /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa New DB title: IGV.fa Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /bios-store1/home/logcoin/Reference/hs_vdj/Homo_sapiens/IG/human_cele_BR/IGV.fa Keep MBits: T Maximum file size: 3000000000B BLAST options error: File IGV.fa is empty Traceback (most recent call last): File "/bios-store1/home/logcoin/.conda/envs/Celescope/bin/celescope", line 8, in <module> sys.exit(main()) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/celescope.py", line 54, in main args.func(args) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 81, in mkref runner() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 36, in __call__ self.build_index() File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/tools/utils.py", line 45, in wrapper result = func(*args, **kwargs) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/site-packages/celescope/vdj/mkref.py", line 76, in build_index subprocess.check_call(f"makeblastdb -parse_seqids -dbtype nucl -in {out_file_name}.fa", shell=True) File "/bios-store1/home/logcoin/.conda/envs/Celescope/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'makeblastdb -parse_seqids -dbtype nucl -in IGV.fa' returned non-zero exit status 1.

To Troubleshot I think the isse NO.1 might cause by the wrong format but have no idea about another issue, so I did some troubleshots:

  1. Format of one of the IMGT reference files showed as following:
    head -n 20 TRAJ.fasta
    trouble1

head -n 20 IGHD.fasta trouble3

  1. Check whether the output directory is null:
    ls -al ../IG/human_IG
    trouble2
    but the "TR" directory is null.
  2. And I finally check version of celescope
    Version
    celescope -v
    2.0.3

I sincerely expect some responses on resolution about these issues, or show me some normative format of IMGT data for constructing index file. Moreover, I hope that some detailed instance on how to construct index file about vdj or full length immune receptors sequencing. Last question I would like to raise: Whether the index files based on IMGT data between single cell vdj and full length immune receptor sequencing are coincident?

I found out the key of issue that there remained some undesired files when I mannually downloaded data from IMGT. For instance, redundant file named "TRGJ" in the directory Path/TR/ made constructing reference failure.
solution1

@Chenjunjie1996
Copy link
Collaborator

Our vdj pipeline focuses on alpha/beta TCR.
Using following command to download only alpha/beta TCR sequence from IMGT to avoid error when running celescope vdj mkref.

wget https://www.imgt.org/download/V-QUEST/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR/TR{A,B}{V,J}.fasta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants