Skip to content

Releases: sacdallago/bio_embeddings

v0.2.2

06 Sep 02:49
Compare
Choose a tag to compare
  • Added the esm1v embedder from Meier et al. 2021, which is part of facebook's esm. Note that this is an ensemble model, so you need to pass ensemble_id with a value from 1 to 5 to select which weights to use.
  • Added the bindEmbed21DL extract protocol which is an ensemble of 5 convolutional neural network that predicts of 3 different types of binding residues (metal, nucleic acids, small molecules).
  • Fix model download
  • Update jaxlib to fix pip installation

v0.2.1

28 Jul 14:47
Compare
Choose a tag to compare
  • BETA: in-silico mutagenesis using ProtTransBertBFD. This computes the likelihood that, according to Bert, a residue in a protein can be a certain amino acid, which can be used as an estimate for the effect of a mutation. This adds two a new mutagenesis and a new protocol plot_mutagenesis in the visualize stages, of which the first one computes the probabilities and writes them to a csv file while the latter visualizes the results as interactive plotly figure.
  • Support half_precision_model for prottrans_bert_bfd and prottrans_albert_bfd
  • Fix a n_components: 2 in the plotly protocol

v0.2.0

14 May 15:57
Compare
Choose a tag to compare
  • Added the prottrans_t5_xl_u50/ProtTransT5XLU50Embedder embedder from the latest ProtTrans revision. You should use this over prottrans_t5_bfd and prottrans_t5_uniref50.
  • The projected_embeddings_file.csv of project stages has been renamed to projected_reduced_embeddings_file.h5. For backwards compatibility, projected_embeddings_file.csv is still written.
  • The projected_embeddings_file parameter of visualize stages has been renamed to projected_reduced_embeddings_file and takes an h5 file. For backwards compatibility, projected_embeddings_file and csv files are still accepted.
  • Added the pb_tucker model as project stage. Tucker is a contrastive learning model trained to distinguish CATH superfamilies. It consumes prottrans_bert_bfd embeddings and reduces the embedding dimensionality from 1024 to 128. See https://www.biorxiv.org/content/10.1101/2021.01.21.427551v1
  • Renamed half_model to half_precision_model

v0.1.7

17 Mar 17:18
Compare
Choose a tag to compare
  • Added prottrans_t5_uniref50/ProtTransT5UniRef50Embedder. This version improves over T5 BFD by being finetuned on UniRef50.
  • Added a half_model option to both T5 models (prottrans_t5_uniref50 and prottrans_t5_bfd). On the tested GPU (Quadro RTX 3000) half_model: True reduces memory consumption
    from 12GB to 7GB while the effect in benchmarks is negligible (±0.1 percentages points in different sets,
    generally below standard error). We therefore recommend switching to half_model: True for T5.
  • Added DeepBLAST from Protein Structural Alignments From Sequence (see example/deepblast for an example)
  • Dropped python 3.6 support and added python 3.9 support
  • Updated the docker example to cache weights