Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence.gc methods that consider IUPAC nucleotide ambiguity #128

Open
mdshw5 opened this issue Nov 13, 2017 · 0 comments
Open

Sequence.gc methods that consider IUPAC nucleotide ambiguity #128

mdshw5 opened this issue Nov 13, 2017 · 0 comments
Assignees

Comments

@mdshw5
Copy link
Owner

mdshw5 commented Nov 13, 2017

The existing Sequence.gc method purposefully ignores characters other than G/C and uses the sequence length as a denominator to produce "fraction g/c". This has a few benefits:

  • we can ignore IUPAC ambiguous DNA codes
  • len(sequence) is fast to compute vs. counting more occurrences of characters

The downside is that any non-GCAT characters may be included in the denominator:

pyfaidx/pyfaidx/__init__.py

Lines 254 to 266 in 7b4d8d7

@property
def gc(self):
""" Return the GC content of seq as a float
>>> x = Sequence(name='chr1', seq='ATCGTA')
>>> y = round(x.gc, 2)
>>> y == 0.33
True
"""
g = self.seq.count('G')
g += self.seq.count('g')
c = self.seq.count('C')
c += self.seq.count('c')
return (g + c) / len(self.seq)

I'd welcome any pull request to implement something like:

  • Sequence.gc_iupac method that counts e.g. S=GC and W=AT, and also considers K=GT. This is considerably more difficult than the current method and requires some validation of the sequence to confirm that it only contains valid IUPAC letters
  • Sequence.gc_strict method that counts G/C and A/T, implicitly ignoring all other characters. This is probably closest to what people expect as GC content
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant