Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blast Database in unmapped_blast rule #7

Open
ezhang113 opened this issue Feb 9, 2023 · 13 comments
Open

Blast Database in unmapped_blast rule #7

ezhang113 opened this issue Feb 9, 2023 · 13 comments

Comments

@ezhang113
Copy link
Collaborator

ezhang113 commented Feb 9, 2023

Describe the bug
Unable to run unmapped_blast rule of the snakemake file due to issues with blast database.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'eclip-qc' in Emily_working branch
  2. Click on 'Snakefile'
  3. Scroll down to 'rule unmapped_blastx'
  4. See error

Or alternatively in command line:
Nonworking code:
export BLASTDB=/projects/ps-yeolab3/bay001/annotations/nr
blastx -db nr -query LARP6.CTRL_IN1.umi.r1.fq_unmapped_downsampled.fasta -out LARP6.CTRL_IN1.umi.r1.fq_unmappedblast_downsampled_blastx.tsv -outfmt 6 -max_target_seqs 5 -max_hsps 1

Working code: blastx -db pdbaa -query LARP6.CTRL_IN1.umi.r1.fq_unmapped_downsampled.fasta -out LARP6.CTRL_IN1.umi.r1.fq_unmappedblast_downsampled_blastx.tsv -outfmt 6 -max_target_seqs 5 -max_hsps 1

Expected behavior
If the rule was to run as intended, a tsv file should be produced (and subsequent piechart).

Additional context
Seems to be an issue with the database. Have tried running with a ncbi database (pdbaa), which worked out okay.

@ezhang113 ezhang113 changed the title Blast Database in unmapped Blast Database in unmapped_blast rule Feb 9, 2023
@byee4
Copy link
Member

byee4 commented Feb 15, 2023

@ezhang113 could you add the branch you're working under and the example LARP6 downsampled fasta file to that branch?

@ezhang113
Copy link
Collaborator Author

Here is the link to my branch and file: https://github.com/YeoLab/eclip-qc/blob/Emily_working/Snakefile

@byee4
Copy link
Member

byee4 commented Feb 15, 2023

Can you try using diamond? It's another aligner that is supposed to be similar in performance to BLAST, but run much faster:

module load diamond;
diamond blastx -d /projects/ps-yeolab3/bay001/annotations/nr/nr -q test.fa -o matches.tsv

@ezhang113
Copy link
Collaborator Author

Updates as of 2/28: some issues with adding diamond to the pipeline, same command works on commandline
Screen Shot 2023-02-28 at 1 51 48 AM

I attempted the diamond prepdb command as some users suggested in the thread you sent me but it seems that I still get the same error and I think it has to do with how diamond is being used with the BLAST db, but I'm not sure why it won't run here when it runs on the commandline alright.

@byee4
Copy link
Member

byee4 commented Feb 28, 2023

@ezhang113 can you include the snakemake command? You also need to update the envs so I can see how you're deploying diamond (which version you're using etc) as I don't see any updated spec here: https://github.com/YeoLab/eclip-qc/tree/Emily_working/envs

@ezhang113
Copy link
Collaborator Author

diamond prepdb -d /projects/ps-yeolab3/bay001/annotations/nr/nr
diamond blastx -d /projects/ps-yeolab3/bay001/annotations/nr/nr -q {input} -o {output} -k 5

Sorry I forgot to update the envs for diamond. I updated it just now.

@byee4
Copy link
Member

byee4 commented Feb 28, 2023

try installing diamond 2.0.4 instead? There might be a bug that requires dropping down a version, since the index appears to work with 2.0.4

@byee4
Copy link
Member

byee4 commented Feb 28, 2023

actually try this fix instead: bbuchfink/diamond#503

@ezhang113
Copy link
Collaborator Author

ezhang113 commented Feb 28, 2023

I found something similar: https://github.com/bbuchfink/diamond_docs/blob/master/Documentation.MD

that suggests this command to prep fasta databases to run with diamond before the diamond command
$ diamond makedb --in nr.faa -d nr

so I changed my command to:
"""
diamond makedb --in /projects/ps-yeolab3/bay001/annotations/nr/nr -d nr
diamond blastx -d nr -q {input} -o {output} -k 5
"""

but now I get an error: Command must be given as string after the shell keyword. (Snakefile, line 74)
which is so strange because the command is as a string so I'm thinking it's probably a different error.

@byee4
Copy link
Member

byee4 commented Feb 28, 2023 via email

@ezhang113
Copy link
Collaborator Author

hmm I tried changing the version and I get a new issue:
TypeError in line 71 of /oasis/tscc/scratch/eczhang/snakemake/Snakefile:
Workflow.conda() missing 1 required positional argument: 'conda_env'
File "/oasis/tscc/scratch/eczhang/snakemake/Snakefile", line 71, in

I'm now thinking it's an issue with the envs/diamond.yaml file I made

@ezhang113
Copy link
Collaborator Author

@byee4
https://github.com/YeoLab/eclip-qc/tree/testing-3/11 is the working branch with Snakefile_Wrapper_Test as the wrapper test file and Snakefile the original pipeline code with the diamond command.

@byee4
Copy link
Member

byee4 commented Mar 4, 2023

@ezhang113 Fix your pie chart script but the blastx rule works: https://github.com/YeoLab/eclip-qc/tree/testing-3/11-brian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants