haplotyping-KIV2-nf Pipeline

haplotyping-KIV2-nf is a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.

Overview

haplotyping-KIV2-nf is designed for haplotyping using a series of Python scripts. The workflow involves multiple stages such as filtering BAM files, extracting haplotypes, and merging haplotype information. The pipeline extracts the polymorphic positions of aligned reads provided in bam-file format. It removes the soft- and hard-clips of the sequences and extracts the polymorphic positions (haplotype) per sample either per-sample or using user defined positions.

Quick Start

Install nextflow

To run the haplotyping workflow, execute the following command:

nextflow run AmstlerStephan/haplotyping-KIV2-nf -r main -c <custom.config> -profile <docker/conda>

Replace <path_to_input_directory> and <path_to_output_directory> with the actual paths.

Scripts and Files

The workflow utilizes several Python scripts:
- extract_haplotypes.py: Used for haplotype extraction.
- merge_haplotypes.py: Used for merging haplotype information.
- filter_bam.py: Used for filtering BAM files.
Additionally, a file named variant_calling_positions is used for haplotype extraction. Its location is determined by the use_variant_calling_positions parameter.

Input Channels

Input data is read from specific directories (input/barcode*/align/consensus/). BAM files, BAM file indexes, and cluster statistics are organized into tuples based on barcode information.

Workflow Stages

Filter BAM Files
- Filters BAM files using the filter_bam.py script based on specified criteria.
Extract Haplotypes
- Uses the extract_haplotypes.py script to extract haplotypes from the filtered BAM files. Haplotypes are filtered based on the provided variant_calling_positions file.
Merge Haplotypes
- Merges the extracted haplotypes using the merge_haplotypes.py script.

Configuration

The workflow may have configurable parameters in the nextflow.config file. Check for customization options there.

Basic Parameters

help: A boolean flag indicating whether to display help information. Default is false.
version: A boolean flag indicating whether to display the workflow version. Default is false.
debug: A boolean flag enabling or disabling debug mode. When set to true, additional debugging information may be provided during workflow execution. Default is false.

Input/Output Parameters

input: The directory containing input data for the workflow. This parameter is required for the workflow to locate and process input files.
ont_pl_dir: The directory associated with the consensus reads obtained from the https://github.com/genepi/umi-pipeline-nf analysis workflow. Default is null.
output: The directory where the workflow will write its output. This parameter is required for storing the results of the haplotyping workflow.
variant_calling_positions: A file specifying variant calling positions. If provided, the workflow uses this file during haplotype extraction.
bam_pattern: The pattern used to match BAM files within the input directory. Default is "masked_consensus.bam".
cluster_stats_pattern: The pattern used to match cluster statistics files within the input directory. Default is "split_cluster_stats.tsv".
min_reads_per_cluster: Minimum number of reads per cluster to be considered during processing. Default is 10.
max_reads_per_cluster: Maximum number of reads per cluster to be considered during processing. Default is 200.
max_edit_distance: Maximum edit distance allowed during merging of haplotype clusters. Default is 2.
use_variant_calling_positions: A boolean flag indicating whether to use variant calling positions. If true, the workflow considers the variant_calling_positions file.
ranges_to_exclude: A comma-separated list of ranges to exclude during processing. Default is "2472,2506".
min_qscore: The minimum quality score required during processing. Default is 45.
output_format: The output format for haplotype results. Default is "fasta".

Other Parameters

threads: The number of threads used during workflow execution. It is set to (Runtime.runtime.availableProcessors() - 1) by default.

Output

The workflow generates output files, including filtered BAM files, extracted haplotypes, and merged haplotype information.

Credits

These scripts were originally written for use by GENEPI, by (@StephanAmstler).

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
assets		assets
bin		bin
config		config
data		data
env		env
lib		lib
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

bin

bin

config

config

data

data

env

env

lib

lib

LICENSE

LICENSE

README.md

README.md

main.nf

main.nf

nextflow.config

nextflow.config

Repository files navigation

haplotyping-KIV2-nf Pipeline

Overview

Quick Start

Scripts and Files

Input Channels

Workflow Stages

Configuration

Basic Parameters

Input/Output Parameters

Other Parameters

Output

Credits

About

Releases

Packages

Languages

License

AmstlerStephan/haplotyping-KIV2-nf

Folders and files

Latest commit

History

Repository files navigation

haplotyping-KIV2-nf Pipeline

Overview

Quick Start

Scripts and Files

Input Channels

Workflow Stages

Configuration

Basic Parameters

Input/Output Parameters

Other Parameters

Output

Credits

About

Resources

License

Stars

Watchers

Forks

Languages