You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from evodiff.pretrained import MSA_OA_DM_MAXSUB
from evodiff.generate_msa import generate_query_oadm_msa_simple
import re
checkpoint = MSA_OA_DM_MAXSUB()
model, collater, tokenizer, scheme = checkpoint
path_to_msa = 'bfd_uniclust_hits.a3m'
n_sequences=64 # number of sequences in MSA to subsample
seq_length=256 # maximum sequence length to subsample
selection_type='random' # or 'MaxHamming'; MSA subsampling scheme
tokeinzed_sample, generated_sequence = generate_query_oadm_msa_simple(path_to_msa, model, tokenizer, n_sequences, seq_length, device='cpu', selection_type=selection_type)
print("New sequence (no gaps, pad tokens)", re.sub('[!-]', '', generated_sequence[0][0],))
The error can be traced back to:
evodiff/utils.py, line 247, in
return np.array([self.a_to_i[a] for a in seq[0]]) # for nested lists
The alphabet seems to not know how to handle ! which should be the padding token. This alphabet appears to be imported from sequence_models.constants as MSA_ALPHABET.
Also this is much less important but I noticed there's three instances of "tokeinzed_sample" as a variable name in the example notebook that almost certainly are meant to be "tokenized_sample"
The text was updated successfully, but these errors were encountered:
If you're struggling to install EvoDiff locally, feel free to try https://www.tamarind.bio/evodiff, a website which offers a no-code interface for bioinformatics tools including protein design with EvoDiff for free.
Hello, I am experiencing a key error attempting to use the code 1 for 1 from the example notebook for conditional generation:
https://github.com/microsoft/evodiff/blob/main/examples/evodiff.ipynb
The error can be traced back to:
evodiff/utils.py, line 247, in
return np.array([self.a_to_i[a] for a in seq[0]]) # for nested lists
The alphabet seems to not know how to handle ! which should be the padding token. This alphabet appears to be imported from sequence_models.constants as MSA_ALPHABET.
Also this is much less important but I noticed there's three instances of "tokeinzed_sample" as a variable name in the example notebook that almost certainly are meant to be "tokenized_sample"
The text was updated successfully, but these errors were encountered: