Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no spikeIn chromosomes found with extention _spikein #924

Open
sunta3iouxos opened this issue Aug 18, 2023 · 12 comments
Open

no spikeIn chromosomes found with extention _spikein #924

sunta3iouxos opened this issue Aug 18, 2023 · 12 comments

Comments

@sunta3iouxos
Copy link

sunta3iouxos commented Aug 18, 2023

Hi Katarzyna and everyone else,
I can not run the spikeIns strings for Chip-seq, this is a CUT&RUN protocol that has the bacteria cary over that can be used to "normalize the samples".
this is the full error:

ChIP-seq -d /mnt/c/AP01/bamSpikes -j 8 --local --useSpikeInForNorm --getSizeFactorsFrom genome --peakCaller Genrich --sampleSheet /mnt/c/AP01/bamSpikes/H3K36me3.tsv --windowSize 500 mm10_gencodeM19_spikes /mnt/c/AP01/bamSpikes/H3K36me3_chip_ty
pe.yaml
Sample sheet found and header is ok!

---- This analysis has been done using snakePipes version 2.7.3 ----
Sample sheet found and header is ok!
SystemExit in line 256 of mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/ChIP-seq/internals.snakefile:
1
  File "mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/ChIP-seq/Snakefile", line 22, in <module>
  File "mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/ChIP-seq/internals.snakefile", line 256, in <module>
  File "<frozen _sitebuiltins>", line 26, in __call__

 No spikein genome detected - no spikeIn chromosomes found with extention _spikein .


Error: snakemake returned an error code of 1, so processing is incomplete!

The steps I have followed:

createIndices -o CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes --tools bowtie2 -j 6 --local --genomeURL CUT-RUNTools-2.0/assemblies/GRCm38_gencode_release19/genome_fasta/genome.fa --gtfURL CUT-RUNTools-2.0/assemblies/GRCm38_gencode_release19/annotation/genes.gtf  --spikeinGenomeURL CUT-RUNTools-2.0/assemblies/EB1/Sequence/WholeGenomeFasta/genome.fa --spikeinGtfURL CUT-RUNTools-2.0/assemblies/EB1/Annotation/Genes/genes.gtf  --blacklist CUT-RUNTools-2.0/blacklist/mm10.blacklist_merged.bed  mm10_gencodeM19_spikes
DNA-mapping -i /mnt/c/AP01/ -o /mnt/c/AP01/bamSpikes --local --trim --trimmer fastp --trimmerOptions "--trim_poly_g --trim_poly_x -Q -L --correction" --dedup --mapq 2 -j 8 mm10_gencodeM19_spikes
ChIP-seq -d /mnt/c/AP01/bamSpikes -j 8 --local --useSpikeInForNorm --getSizeFactorsFrom genome --peakCaller Genrich --sampleSheet /mnt/c/AP01/bamSpikes/H3K36me3.tsv --windowSize 500 mm10_gencodeM19_spike /mnt/c/AP01/bamSpikes/H3K36me3_chip_type.yaml

and this is my hybrid genome yalm
mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/organisms/mm10_gencodeM19_spikes.yaml

blacklist_bed: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/annotation/blacklist.bed
bowtie2_index: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/BowtieIndex/genome
bwa_index: ''
bwa_mem2_index: ''
bwameth2_index: ''
bwameth_index: ''
extended_coding_regions_gtf: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/annotation/genes.slop.gtf
genes_bed: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/annotation/genes.bed
genes_gtf: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/annotation/genes.gtf
genome_2bit: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/genome_fasta/genome.2bit
genome_dict: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/genome_fasta/genome.dict
genome_fasta: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/genome_fasta/genome.fa
genome_index: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/genome_fasta/genome.fa.fai
genome_size: 2652783500
hisat2_index: ''
ignoreForNormalization: ''
known_splicesites: ''
rmsk_file: ''
spikein_blacklist_bed: ''
spikein_genes_gtf: CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/annotation/spikein_genes.gtf
star_index: ''

indexed genome:

genome.1.bt2  genome.2.bt2  genome.3.bt2  genome.4.bt2  genome.fa  genome.rev.1.bt2  genome.rev.2.bt2
@sunta3iouxos
Copy link
Author

sunta3iouxos commented Aug 21, 2023

just to add that the bam file header is:
the added E.coli chromosome is named Chromosome. Do I need to add this one in the --spikeinExt option?

bam header ```bash

@hd VN:1.5 SO:coordinate
@sq SN:chr1 LN:195471971
@sq SN:chr2 LN:182113224
@sq SN:chr3 LN:160039680
@sq SN:chr4 LN:156508116
@sq SN:chr5 LN:151834684
@sq SN:chr6 LN:149736546
@sq SN:chr7 LN:145441459
@sq SN:chr8 LN:129401213
@sq SN:chr9 LN:124595110
@sq SN:chr10 LN:130694993
@sq SN:chr11 LN:122082543
@sq SN:chr12 LN:120129022
@sq SN:chr13 LN:120421639
@sq SN:chr14 LN:124902244
@sq SN:chr15 LN:104043685
@sq SN:chr16 LN:98207768
@sq SN:chr17 LN:94987271
@sq SN:chr18 LN:90702639
@sq SN:chr19 LN:61431566
@sq SN:chrX LN:171031299
@sq SN:chrY LN:91744698
@sq SN:chrM LN:16299
@sq SN:GL456210.1 LN:169725
@sq SN:GL456211.1 LN:241735
@sq SN:GL456212.1 LN:153618
@sq SN:GL456213.1 LN:39340
@sq SN:GL456216.1 LN:66673
@sq SN:GL456219.1 LN:175968
@sq SN:GL456221.1 LN:206961
@sq SN:GL456233.1 LN:336933
@sq SN:GL456239.1 LN:40056
@sq SN:GL456350.1 LN:227966
@sq SN:GL456354.1 LN:195993
@sq SN:GL456359.1 LN:22974
@sq SN:GL456360.1 LN:31704
@sq SN:GL456366.1 LN:47073
@sq SN:GL456367.1 LN:42057
@sq SN:GL456368.1 LN:20208
@sq SN:GL456370.1 LN:26764
@sq SN:GL456372.1 LN:28664
@sq SN:GL456378.1 LN:31602
@sq SN:GL456379.1 LN:72385
@sq SN:GL456381.1 LN:25871
@sq SN:GL456382.1 LN:23158
@sq SN:GL456383.1 LN:38659
@sq SN:GL456385.1 LN:35240
@sq SN:GL456387.1 LN:24685
@sq SN:GL456389.1 LN:28772
@sq SN:GL456390.1 LN:24668
@sq SN:GL456392.1 LN:23629
@sq SN:GL456393.1 LN:55711
@sq SN:GL456394.1 LN:24323
@sq SN:GL456396.1 LN:21240
@sq SN:JH584292.1 LN:14945
@sq SN:JH584293.1 LN:207968
@sq SN:JH584294.1 LN:191905
@sq SN:JH584295.1 LN:1976
@sq SN:JH584296.1 LN:199368
@sq SN:JH584297.1 LN:205776
@sq SN:JH584298.1 LN:184189
@sq SN:JH584299.1 LN:953012
@sq SN:JH584300.1 LN:182347
@sq SN:JH584301.1 LN:259875
@sq SN:JH584302.1 LN:155838
@sq SN:JH584303.1 LN:158099
@sq SN:JH584304.1 LN:114452
@sq SN:Chromosome LN:4686137
@rg ID:A006200317_201074_S18_L000 DS:A006200317_201074_S18_L000 PL:ILLUMINA SM:A006200317_201074_S18_L000
@pg ID:bowtie2 PN:bowtie2 CL:"mambaforge/envs/22c8c2155516caee04b33c92cdb28191/bin/bowtie2-align-s --wrapper basic-0 -X 1000 -x CUT-RUNTools-2.0/assemblies/BowtieIndex/genome --fr --rg-id A006200317_201074_S18_L000 --rg DS:A006200317_201074_S18_L000 --rg PL:ILLUMINA --rg SM:A006200317_201074_S18_L000 -p 8 -1 FASTQ_fastp/A006200317_201074_S18_L000_R1_001.fastq.gz -2 FASTQ_fastp/A006200317_201074_S18_L000_R2_001.fastq.gz" VN:2.4.5
@pg ID:samtools PN:samtools CL:samtools view -Sb - PP:bowtie2 VN:1.15.1
@pg ID:samtools.1 PN:samtools CL:samtools sort -m 2G -T /tmp//snakepipes.CrgbFQGyqz/A006200317_201074_S18_L000 -@ 2 -O bam - PP:samtools VN:1.15.1
@pg ID:sambamba CL:markdup -t 8 --sort-buffer-size=6000 --overflow-list-size 600000 --tmpdir /tmp//snakepipes.9Jiva0fp88 Bowtie2/A006200317_201074_S18_L000.sorted.bam Bowtie2/A006200317_201074_S18_L000.bam PP:samtools.1 VN:1.0
@pg ID:samtools.2 PN:samtools PP:sambamba VN:1.15.1 CL:samtools view -@ 2 -b -F 1024 -q 2 -o filtered_bam/A006200317_201074_S18_L000.filtered.tmp.bam Bowtie2/A006200317_201074_S18_L000.bam
@pg ID:samtools.3 PN:samtools PP:samtools.2 VN:1.15.1 CL:samtools view -h /mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201074_S18_L000.filtered.bam
A00620:317:HCJLNDSX7:2:2426:19678:32095 163 chr1 3003373 42 101M = 3003505 233 TGGACTCGTGAGACAAGATGGCTCCTTCACCTGCTCTGGGGGTCAGAACCCTCCCAGGTGGCCACCTCTCCTGTGGCGGGGAAGGTACCTGAAAGTCTTCA FFFFFFFF:FFFFFFFF,FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFF AS:i:-15 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:41A34T14G9 YS:i:-5 YT:Z:CP RG:Z:A006200317_201074_S18_L000

</details>

Chromosome entries in the bam files.

```bash
for bam in /mnt/c/AP01/bamSpikes/filtered_bam/*bam; do echo $bam; samtools view $bam | grep Chromosome | wc -l; done
/mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201074_S18_L000.filtered.bam
3354
/mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201076_S19_L000.filtered.bam
2853
/mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201078_S20_L000.filtered.bam
3826
/mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201080_S21_L000.filtered.bam
1675
/mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201082_S22_L000.filtered.bam
3535
/mnt/c/AP01/bamSpikes/filtered_bam/A006200317_201084_S23_L000.filtered.bam
1804

@katsikora
Copy link
Contributor

Hi Sunta3iouxos,

your workflow in general looks good, i.e. createIndices -> DNA-mapping -> ChIPseq.

What does grep -c _spikein CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/genome_fasta/genome.fa return?
It looks like the _spikein postfix was not automatically appended by createIndices.

Best wishes,

Katarzyna

@sunta3iouxos
Copy link
Author

0
but

grep -c Chromosome CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spik
es/genome_fasta/genome.fa
1

@katsikora
Copy link
Contributor

What does the chromosome name in the bacterial fasta look like? Are there any spaces in it? This typically causes issues.

@sunta3iouxos
Copy link
Author

 grep ">" CUT-RUNTools-2.0/assemblies/EB1/Sequence/WholeGenomeFasta/
genome.fa
>Chromosome
grep -c " " CUT-RUNTools-2.0/assemblies/EB1/Sequence/WholeGenomeFas
ta/genome.fa
0

@sunta3iouxos
Copy link
Author

I added the

--spikeinExt Chromosome

and it does not complain. Is this correct then?

P.S. There are some other warnings, and messages that require new tickets.

@katsikora
Copy link
Contributor

You mean you passed this to ChIP-seq? It might actually work.
Still, something went wrong in the createIndices instance. Have a look what the full chromosome name looks like, whether there are any characters in there that might bork up the renaming.

With snakePipes 2.7.2, I was able to create a hybrid genome with createIndices as:

createIndices -o $output/GRch38_Ecoli --tools bowtie2 --genomeURL /$genome/genome.fa --gtfURL $genes/genes.gtf --spikeinGenomeURL $genome/Ecoli_C3103_clean.fasta --userYAML GRCh38_g31_Ecoli

The chromosome name in the E.coli fasta was renamed from >CP053595.1 Escherichia coli strain T7Express_LysYIq chromosome, complete genome to >CP053595.1_spikein Escherichia coli strain T7Express_LysYIq chromosome, complete genome .

If you'd like to troubleshoot what happens with the chromosome names in the createIndices` worflow, you can pass `--snakemakeOptions ` --notemp ` , which would keep the temporary intermediate files.

Best wishes,

Katarzyna

@sunta3iouxos
Copy link
Author

You mean you passed this to ChIP-seq? It might actually work.

Yes, using the

. Spikein chromosome extention can be specified with --spikeinExt.

@sunta3iouxos
Copy link
Author

Still, something went wrong in the createIndices instance. Have a look what the full chromosome name looks like, whether there are any characters in there that might bork up the renaming

If you follow a bit the provided information, the _spikein identifier is there but only in the "/spikein_genes.gtf " but nowhere else (at least not in the readable files (so excluding the binary index bowtie2 files.

The genome.fa of the bacteria is as follows:

head CUT-RUNTools-2.0/assemblies/EB1/Sequence/WholeGenomeFasta/genome.fa
>Chromosome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG
CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG
GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA

I do not see any specific error.
The generated genome.fa in the spiked-in folder looks like:

grep ">" CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/genome_fasta/genome.fa
>chr1 1
>chr2 2
>chr3 3
>chr4 4
>chr5 5
>chr6 6
>chr7 7
>chr8 8
>chr9 9
>chr10 10
>chr11 11
>chr12 12
>chr13 13
>chr14 14
>chr15 15
>chr16 16
>chr17 17
>chr18 18
>chr19 19
>chrX X
>chrY Y
>chrM MT
>GL456210.1 GL456210.1
>GL456211.1 GL456211.1
>GL456212.1 GL456212.1
>GL456213.1 GL456213.1
>GL456216.1 GL456216.1
>GL456219.1 GL456219.1
>GL456221.1 GL456221.1
>GL456233.1 GL456233.1
>GL456239.1 GL456239.1
>GL456350.1 GL456350.1
>GL456354.1 GL456354.1
>GL456359.1 GL456359.1
>GL456360.1 GL456360.1
>GL456366.1 GL456366.1
>GL456367.1 GL456367.1
>GL456368.1 GL456368.1
>GL456370.1 GL456370.1
>GL456372.1 GL456372.1
>GL456378.1 GL456378.1
>GL456379.1 GL456379.1
>GL456381.1 GL456381.1
>GL456382.1 GL456382.1
>GL456383.1 GL456383.1
>GL456385.1 GL456385.1
>GL456387.1 GL456387.1
>GL456389.1 GL456389.1
>GL456390.1 GL456390.1
>GL456392.1 GL456392.1
>GL456393.1 GL456393.1
>GL456394.1 GL456394.1
>GL456396.1 GL456396.1
>JH584292.1 JH584292.1
>JH584293.1 JH584293.1
>JH584294.1 JH584294.1
>JH584295.1 JH584295.1
>JH584296.1 JH584296.1
>JH584297.1 JH584297.1
>JH584298.1 JH584298.1
>JH584299.1 JH584299.1
>JH584300.1 JH584300.1
>JH584301.1 JH584301.1
>JH584302.1 JH584302.1
>JH584303.1 JH584303.1
>JH584304.1 JH584304.1
>Chromosome

in the genome.fa.fai the correct size is reported:

grep Chromosome CUT-RUNTools-2.0/assemblies/mm10_gencodeM1
9_spikes/genome_fasta/genome.fa.fai
Chromosome      4686137 2776387558      60      61

INTERESTINGLY

grep Chromosome  /home/tgeorgom/CUT-RUNTools-2.0/assemblies/mm10_gencodeM19_spikes/annotation/spikein_genes.gtf | head -n 1
**Chromosome_spikein**      ena     CDS     190     252     .       +       0       exon_number "1"; gene_biotype "protein_coding"; gene_id "ECDH10B_0001"; gene_name "thrL"; gene_source "ena"; gene_version "1"; p_id "P2524"; protein_id "ACB01206"; protein_version "1"; transcript_biotype "protein_coding"; transcript_id "ACB01206"; transcript_name "thrL-1"; transcript_source "ena"; transcript_version "1"; tss_id "TSS3237";

@katsikora
Copy link
Contributor

I can check if I can reproduce this issue here.

@sunta3iouxos
Copy link
Author

Hi there,
any updates on this one? did you have time to check this?

@katsikora
Copy link
Contributor

Hi,

could you send me the link to this bacterial genome fasta?

Best wishes,

Katarzyna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants