Skip to content

Commit 0989e37

Browse files
committed
Merge branch 'devel'
2 parents a264222 + 09ed888 commit 0989e37

File tree

3 files changed

+85
-3
lines changed

3 files changed

+85
-3
lines changed

docs/bcftools.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
layout: default
3+
title: Bcftools
4+
parent: 2. Program guides
5+
---
6+
7+
# Bcftools
8+
9+
Bcftools are a set of [utilities for variant calling and manipulating VCFs and BCFs](https://samtools.github.io/bcftools/bcftools.html).
10+
11+
## Generating genotype likelihoods for alignment files using `bcftools mpileup`
12+
13+
`bcftools mpileup` can be used to generate VCF or BCF files containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files as follows:
14+
15+
```bash
16+
$ bcftools mpileup --max-depth 10000 --threads n -f reference.fasta -o genotype_likelihoods.bcf reference_sequence_alignmnet.bam
17+
```
18+
19+
In this command...
20+
21+
1. **`--max-depth`** or **`-d`** sets the reads per input file for each position in the alignment. In this case, it is set to 10000
22+
2. **`--threads`** sets the number (*n*) of processors/threads to use.
23+
3. **`--fasta-ref`** or **`-f`** is used to select the [faidx-indexed FASTA](samtools.md#indexing-a-fasta-file-using-samtools-faidx) nucleotide reference file (*reference.fasta*) used for the alignment.
24+
4. **`--output `** or **`-o`** is used to name the ouput file (*genotype_likelihoods.bcf*).
25+
5. The final argument given is the input BAM alignment file (*reference_sequence_alignment.bam*). Multiple input files can be given here.
26+
27+
## Variant calling using `bcftools call`
28+
29+
`bcftools call` can be used to call SNP/indel variants from a BCF file as follows:
30+
31+
```bash
32+
$ bcftools call -O b --threads n -vc --ploidy 1 -p 0.05 -o variants_unfiltered.bcf genotype_likelihoods.bcf
33+
```
34+
35+
In this command...
36+
37+
1. **`--output-type`** or **`-O`** is used to select the output format. In this case, *b* for BCF.
38+
2. **`--threads`** sets the number (*n*) of processors/threads to use.
39+
3. **`-vc`** specifies that we want the output to contain variants only, using the original [SAMtools](samtools.md) consensus caller.
40+
4. **`--ploidy`** specifies the ploidy of the assembly.
41+
5. **`--pval-threshold`** or **`-p`** is used to the set the p-value threshold for variant sites (*0.05*).
42+
6. **`--output `** or **`-o`** is used to name the ouput file (*variants_unfiltered.bcf*).
43+
7. The final argument is the input BCF file (*genotype_likelihoods.bcf*).
44+
45+
## Filtering variants using `bcftools filter`
46+
47+
`bcftools filter` can be used to filter variants from a BCF file as follows...
48+
49+
```bash
50+
$ bcftools filter --threads n -i '%QUAL>=20' -O v -o variants_filtered.vcf variants_unfiltered.bcf
51+
```
52+
53+
In this command...
54+
55+
1. **`--threads`** sets the number (*n*) of processors/threads to use.
56+
2. **`--include`** or **`-i`** is used to define the expression used to filter sites. In this case, *`%QUAL>=20`* results in sites with a quality score greater than or equal to 20.
57+
3. **`--output-type`** or **`-O`** is used to select the output format. In this case, *v* for VCF.
58+
4. **`--output `** or **`-o`** is used to name the ouput file (*variants_filtered.vcf*).
59+
5. The final argument is the input BCF file (*genotype_likelihoods.bcf*).
60+
61+
## See also
62+
63+
- [File formats used in bioinformatics](file_formats.md)
64+
- [SNP calling script](snp_calling.md)
65+
66+
## Futher reading
67+
68+
- [bcftools documentation](https://samtools.github.io/bcftools/bcftools.html)

docs/file_formats.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ A brief introduction to various file formats used in bioinformatics.
2323
- [CRAM](#cram)
2424
- [Stockholm format](#stockholm-format)
2525
- [Example Stockholm file](#example-stockholm-file)
26-
- [VCF](#vcf)
26+
- [VCF](#vcf)
27+
- [BCF](#bcf)
2728
- [Generic Feature Formats](#generic-feature-formats)
2829
- [GFF general structure](#gff-general-structure)
2930
- [GTF](#gtf)
@@ -202,6 +203,9 @@ The last line in the header section begins with `#`; this line gives the headers
202203
9. `FORMAT` An (optional) extensible list of fields for describing the samples.
203204
10. `SAMPLEs` For each (optional) sample described in the file, values are given for the fields listed in FORMAT. If multiple samples have been aligned to the reference sequence, each sample will have its own column.
204205

206+
### BCF
207+
208+
Binary Call Format (BCF) is a binary representation of [VCF](#vcf), containing the same information in binary format for improved performance.
205209

206210
## Generic Feature Formats
207211

docs/samtools.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ In this command...
5151

5252
1. **`sorted_example_alignment.bam`** is the name of the input file.
5353

54-
### Demonstration
54+
### Demonstration 1
5555

5656
In this video, `samtools` is used to convert `example_alignment.sam` into a BAM file, sort that BAM file, and index it.
5757

@@ -71,12 +71,22 @@ In this command...
7171
1. **`example_nucleotide_sequence.fasta`** is the reference genome input.
7272
2. **`example_reads_1.fastq`** and **`example_reads_2.fastq`** are the names of the simulated read output files.
7373

74-
### Demonstration
74+
### Demonstration 2
7575

7676
In this video, `wgsim` is used to simulate reads from `example_nucleotide_sequence.fasta`.
7777

7878
[![asciicast](https://asciinema.org/a/m89gXtx4cKRnKpI6amWj3BEAH.svg)](https://asciinema.org/a/m89gXtx4cKRnKpI6amWj3BEAH?autoplay=1)
7979

80+
## Indexing a FASTA file using `samtools faidx`
81+
82+
SAMtools can be used to index a FASTA file as follows...
83+
84+
```bash
85+
$ samtools faidx file.fasta
86+
```
87+
88+
After running this command, `file.fasta` can now be used by [bcftools](bcftools.md).
89+
8090
## See also
8191

8292
- [Alignment formats](file_formats.md#alignment-formats)

0 commit comments

Comments
 (0)