Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: profile DNA damage to determine ancient or modern provenance #119

Open
franciscozorrilla opened this issue Jan 7, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@franciscozorrilla
Copy link
Owner

franciscozorrilla commented Jan 7, 2023

We have already used metaGEM to assemble MAGs from ancient samples, e.g. see zenodo repo & github issue.

A tool like pydamage requires an alignment file generated by mapping short reads to an assembly, and could be integrated into the metaGEM workflow in a number of ways:

  1. At the assembly level when running the crossMap rule
  2. At the MAG level when running the abundance rule
  3. At the MAG level as a new standalone pyamage rule
@franciscozorrilla franciscozorrilla added the enhancement New feature or request label Jan 7, 2023
@franciscozorrilla franciscozorrilla self-assigned this Jan 7, 2023
@franciscozorrilla
Copy link
Owner Author

franciscozorrilla commented Jan 7, 2023

rule pydamage:
    input:
        bins = f'{config["path"]["root"]}/{config["folder"]["reassembled"]}/{{IDs}}/reassembled_bins',
        R1 = f'{config["path"]["root"]}/{config["folder"]["qfiltered"]}/{{IDs}}/{{IDs}}_R1.fastq.gz', 
        R2 = f'{config["path"]["root"]}/{config["folder"]["qfiltered"]}/{{IDs}}/{{IDs}}_R2.fastq.gz'
    output:
        directory(f'{config["path"]["root"]}/ancient_damage/{{IDs}}/')
    benchmark:
        f'{config["path"]["root"]}/{config["folder"]["benchmarks"]}/{{IDs}}.pydamage.benchmark.txt'
    message:
        """
        Modified abundance rule to generate sorted bam file for ancient dna damage estimate of genomes
        """
    shell:
        """     
        # Activate ancient environment
        set +u;source activate ancient;set -u;

        # Make sure output folder exists
        mkdir -p {output}

        # Make job specific scratch dir
        sampleID=$(echo $(basename $(dirname {input.R1})))
        echo -e "\nCreating temporary directory {config[path][scratch]}/ancient_damage/${{sampleID}} ... "
        mkdir -p {config[path][scratch]}/ancient_damage/${{sampleID}}

        # Move into scratch dir
        cd {config[path][scratch]}/ancient_damage/${{sampleID}}

        # Copy files
        echo -e "\nCopying quality filtered paired end reads and generated MAGs to {config[path][scratch]} ... "
        cp {input.R1} {input.R2} {input.bins}/* .

        echo -e "\nConcatenating all bins into one FASTA file and add bin IDs to contig headers ... "
        while read bin;do var=$(echo $bin|sed 's/.fa//g');paste $bin|sed "s/>/>${{var}}_/g";done< <(ls|grep bin) > $(basename {output}).fa

        echo -e "\nCreating bwa index for concatenated FASTA file ... "
        bwa index $(basename {output}).fa

        echo -e "\nMapping quality filtered paired end reads to concatenated FASTA file with bwa mem ... "
        bwa mem -t 56 $(basename {output}).fa \
            $(basename {input.R1}) $(basename {input.R2}) > $(basename {output}).sam

        #cleanup fastq
        rm *.fastq.gz

        echo -e "\nConverting SAM to BAM with samtools view ... "
        samtools view -@ 56 -Sb $(basename {output}).sam > $(basename {output}).bam

        #cleanup sam
        rm *.sam

        echo -e "\nSorting BAM file with samtools sort ... "
        samtools sort -@ 56 -o $(basename {output}).sort $(basename {output}).bam

        #cleanup bam
        rm *.bam

        echo -e "\nExtracting stats from sorted BAM file with samtools flagstat ... "
        samtools flagstat $(basename {output}).sort > map.stats

        echo -e "\nCopying sample_map.stats file to root/abundance/sample for bin concatenation and deleting temporary FASTA file ... "
        cp map.stats {output}/$(basename {output})_map.stats

        #rename file extension to prevent pydamage from throwing a hissy fit
        mv $(basename {output}).sort $(basename {output}).bam

        #create index file
        samtools index $(basename {output}).bam

        #running pydamage, dont use --show_al unlesss you want 1.5TB of logfiles
        pydamage --outdir pydamage_results analyze --process 56 --verbose $(basename {output}).bam

        mv pydamage_results {output}

        #cleanup sorted bam
        rm *.bam
        
        """

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant