This is is the fastqc pipeline from the Sequana projet
- Overview
Runs fastqc and multiqc on a set of Sequencing data to produce control quality reports
- Input
A set of FastQ files (paired or single-end) compressed or not
- Output
An HTML file summary.html (individual fastqc reports, mutli-samples report)
- Status
Production
- Wiki
- Documentation
This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io
- Citation
Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352
sequana_fastqc is based on Python3, just install the package as follows:
pip install sequana_fastqc --upgrade
You will need third-party software such as fastqc. Please see below for details.
If you have a set of FastQ files in a data/ directory, type:
sequana_fastqc --input-directory data
To know more about the options (e.g., add a different pattern to restrict the execution to a subset of the input files, change the output/working directory, etc):
sequana_fastqc --help
The call to sequana_fastqc creates a directory fastqc. Then, you go to the working directory and execute the pipeline as follows:
cd fastqc
sh fastqc.sh # for a local run
This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the fastqc.rules and config.yaml files and then execute the pipeline yourself with specific parameters:
snakemake -s fastqc.rules --cores 4 --stats stats.txt
Or use sequanix interface.
Please see the Wiki for more examples and features.
You can retrieve test data from sequana_fastqc (https://github.com/sequana/fastqc) or type:
wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R1_001.fastq.gz
wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R2_001.fastq.gz
then, prepare the pipeline:
sequana_fastqc --input-directory .
cd fastqc
sh fastq.sh
# once done, remove temporary files (snakemake and others)
make clean
Just open the HTML entry called summary.html. A multiqc report is also available. You will get expected images such as the following one:
Please see the Wiki for more examples and features.
This pipelines requires the following executable(s):
- fastqc
- falco (optional)
For Linux users, we provide apptainer/singularity images available through the damona project (https://damona.readthedocs.io).
To make use of them, initiliase the pipeline with the --use-apptainer option and everything should be downloaded automatically for you, which also guarantees reproducibility:
sequana_fastqc --input-directory data --use-apptainer --apptainer-prefix ~/images
This pipeline runs fastqc in parallel on the input fastq files (paired or not) and then execute multiqc. A brief sequana summary report is also produced. s You may use falco instead of fastqc. This is experimental but seem to work for Illumina/FastQ files.
This pipeline has been tested on several hundreds of MiSeq, NextSeq, MiniSeq, ISeq100, Pacbio runs.
It produces a md5sum of your data. It copes with empty samples. Produces ready-to-use HTML reports, etc
Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
Version | Description |
---|---|
1.8.2 | * Fix the onerror typo in the pipeline, fix CI. |
1.8.1 | * update __init__ (version) |
1.8.0 |
|
1.7.1 |
|
1.7.0 | * Use new rulegraph wrapper and new graphviz apptainer |
1.6.2 | * slight refactorisation to use rulegraph wrapper |
1.6.1 |
|
1.6.0 | * Fixed falco output error and use singularity containers |
1.5.0 | * removed modules completely. |
1.4.2 | * simplified pipeline (suppress setup and use existing wrapper) |
1.4.1 | * simplified pipeline with wrappers/rules |
1.4.0 |
|
1.3.0 |
|
1.2.0 |
|
1.1.0 |
|
1.0.1 | * add md5sum of input files as md5.txt file |
1.0.0 |
|
0.9.15 | * For the HTML reports, takes into account samples with zero reads |
0.9.14 | * round up some statistics in the main table |
0.9.13 | * improve the summary HTML report |
0.9.12 | * implemented new --from-project option |
0.9.11 |
|
0.9.10 | * simplify the onsuccess section |
0.9.9 | * add missing png and pipeline (regression bug) |
0.9.8 | * add missing multi_config file |
0.9.7 |
|
0.9.6 | add the readtag option |
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.