Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"megalodon_extras modified_bases create_motif_bed" does not return all of the positions #335

Open
ziczhang opened this issue Jan 31, 2023 · 0 comments

Comments

@ziczhang
Copy link

ziczhang commented Jan 31, 2023

Hi,

I used the following command to produce a bed file of the location of CpGs in hg38.

megalodon_extras modified_bases create_motif_bed --motif CG 0 --out-filename motif_sites.bed UCSC.hg38.fa

and the motif_sites.bed started from

chr1	11460	11461	.	.	+
chr1	11461	11462	.	.	-
chr1	11479	11480	.	.	+
chr1	11480	11481	.	.	-

However, the actual CpGs in hg38 start at

chr1	10468	10469	.	.	+
chr1	10469	10470	.	.	-
chr1	10470	10471	.	.	+
chr1	10471	10472	.	.	-

I don't know if this is the intended behavior,
but this is because "megalodon_extras modified_bases create_motif_bed" ignores lowercase sequences in the fasta file.

This can be fixed by modifying

def compile_motif_pat(raw_motif):
    ambig_pat_str = "".join(
        "[{}]".format(SINGLE_LETTER_CODE[letter]) for letter in raw_motif
    )
    # add lookahead group to serach for overlapping motif hits
-    return re.compile("(?=({}))".format(ambig_pat_str))
+    return re.compile("(?=({}))".format(ambig_pat_str),flags=re.IGNORECASE)

def compile_rev_comp_motif_pat(raw_motif):
    ambig_pat_str = "".join(
        "[{}]".format("".join(comp(b) for b in SINGLE_LETTER_CODE[letter]))
        for letter in raw_motif[::-1]
    )
    # add lookahead group to serach for overlapping motif hits
-    return re.compile("(?=({}))".format(ambig_pat_str))
+    return re.compile("(?=({}))".format(ambig_pat_str),flags=re.IGNORECASE)

in megalodon_helper.py.

Best,

Zicong

@ziczhang ziczhang changed the title "megalodon_extras modified_bases create_motif_bed" does not return all of the position "megalodon_extras modified_bases create_motif_bed" does not return all of the positions Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant