Skip to content

Commit

Permalink
[feat] expose bfactors in protein_to_pyg function (#388)
Browse files Browse the repository at this point in the history
* [feat] expose bfactors in protein_to_pyg function

* [doc] update CHANGELOG

* [doc] changed .tensor to .from_numpy for memory efficiency

* [fix] calculate bfactor per residue instead of per atom

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
kierandidi and pre-commit-ci[bot] committed Apr 23, 2024
1 parent a669516 commit e861231
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 1 deletion.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* Fix bug where the `deprotonate` argument is not wired up to `graphein.protein.graphs.construct_graphs`. [#375](https://github.com/a-r-j/graphein/pull/375)

#### Misc
* exposed `fill_value` option to `protein_to_pyg` function. [#385](https://github.com/a-r-j/graphein/pull/385)
* exposed `fill_value` and `bfactor` option to `protein_to_pyg` function. [#385](https://github.com/a-r-j/graphein/pull/385) and [#388](https://github.com/a-r-j/graphein/pull/388)
* Updated Foldcomp datasets with improved setup function and updated database choices such as ESMAtlas. [#382](https://github.com/a-r-j/graphein/pull/382)
* Resolve issue with notebook version and `pluggy` in Dockerfile. [#372](https://github.com/a-r-j/graphein/pull/372)
* Remove `typing_extension` as dependency since we now primarily support Python >=3.8 and `Literal` is included in `typing` there.
Expand Down
13 changes: 13 additions & 0 deletions graphein/protein/tensor/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ def protein_to_pyg(
atom_types: List[str] = PROTEIN_ATOMS,
remove_nonstandard: bool = True,
store_het: bool = False,
store_bfactor: bool = False,
fill_value_coords: float = 1e-5,
) -> Data:
"""
Expand Down Expand Up @@ -159,6 +160,12 @@ def protein_to_pyg(
:param store_het: Whether or not to store heteroatoms in the ``Data``
object. Default is ``False``.
:type store_het: bool
:param store_bfactor: Whether or not to store bfactors in the ``Data``
object. Default is ``False.
:type store_bfactor: bool
:param fill_value_coords: Fill value to use for positions in atom37
representation that are not filled. Defaults to 1e-5
:type fill_value_coords: float
:returns: ``Data`` object with attributes: ``x`` (AtomTensor), ``residues``
(list of 3-letter residue codes), id (ID of protein), residue_id (E.g.
``"A:SER:1"``), residue_type (torch.Tensor), ``chains`` (torch.Tensor).
Expand Down Expand Up @@ -254,6 +261,12 @@ def protein_to_pyg(
)
if store_het:
out.hetatms = [het_coords]

if store_bfactor:
# group by residue_id and average b_factor per residue
residue_bfactors = df.groupby("residue_id")["b_factor"].mean()
out.bfactor = torch.from_numpy(residue_bfactors.values)

return out


Expand Down

0 comments on commit e861231

Please sign in to comment.