Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow lowercase characters in hex strings #122

Open
Adamtaranto opened this issue Jan 27, 2024 · 0 comments
Open

Allow lowercase characters in hex strings #122

Adamtaranto opened this issue Jan 27, 2024 · 0 comments

Comments

@Adamtaranto
Copy link

Adamtaranto commented Jan 27, 2024

The pattern given for tags of type H in the GFA 1.0 spec is [0-9A-F]+ which only allows uppercase characters.

I think lowercase characters are valid in hex strings so should the pattern be: [0-9A-Fa-f]+

Is the pattern wrong or should the hexstring be converted to uppercase?

import hashlib
import re

seq = "ATGC"

bytearray = hashlib.sha256(seq.encode()).digest()
# b'\x98 \xf5\xa8L\xc4\x043\x0em\x97\xbd\xe1;X\x0f\xe9\xbfh\xb9\xca\ttbE\x93\xf62ES\x08\x8c'

hex_string = bytearray.hex()
# '9820f5a84cc404330e6d97bde13b580fe9bf68b9ca0974624593f6324553088c'

re.fullmatch(r"[0-9A-F]+", hex_string)
# No matches

re.fullmatch(r"[0-9A-Fa-f]+", hex_string)
# <re.Match object; span=(0, 64), match='9820f5a84cc404330e6d97bde13b580fe9bf68b9ca0974624>
Adamtaranto referenced this issue in biopython/biopython Jan 27, 2024
* Adds tags from segment lines into the annotations dictionary of
  the SeqRecord, handles undefined sequences and sets their length
  according to LN tag, raises a warning on incorrect tag types, and
  adds more links to the format documentation.

* Check for valid tag name and store tag types in annotations

* Tag value may contain colons

* Update NEWS.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant