Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up on the expansion of amino acids to include non-naturals. #52

Open
francisacquah466 opened this issue Jul 7, 2023 · 1 comment

Comments

@francisacquah466
Copy link

Hi @dosorio

Thanks for such a wonderful package.

I'm working to generate lot of peptides mostly with non-natural amino acids. I was wondering if there is a possibility of expanding the list of amino acids to include new amino acids and their SMILES.
So that for the aaSMILES function peptides with non-naturals to be pass into to generate SMILES for them.
I envisage a situation where one letter amino acid name may be problematic.
Is there a way to this can be added. Maybe by using the 3-letter amino acid code rather than the 1 letter code.

It would help a lot!

Thanks!

@jspaezp
Copy link
Contributor

jspaezp commented Sep 21, 2023

Hey there! Sorry for the late reply ...
RN the implementation of the smiles generator is fairly simple (https://github.com/jspaezp/Peptides/blob/b0aab3765f99a0c4c79dddfecdd12d3ff71c9a20/R/smilesStrings.R) and I think it could be extended to 3-letter aas, but since the 3 letter abbreviation is not supported in any other part of the package (that I can recall) I would feel very inconsistent ...

Maybe something like this would work for you (I have not tested it but I feel like it would work ...):

three_letter_aaSMILES <- function(seq) {
  aminoacid_smiles <- c(
    "Ala" = "N[C@@]([H])(C)C(=O)O",
    ... # All other amino-acids added here
    "Val" = "N[C@@]([H])(C(C)C)C(=O)O")

  # split_sequences <- strsplit(toupper(seq), "")
  split_sequences <- lapply(seq, function(x) gsub("(.{5})", "\\1 ", x))

  smiles_aa_sequences <- lapply(split_sequences, function(x) aminoacid_smiles[x])

  # This trims the last O in the -OH in the carbonyl in each aminoacid
  concat_aa_smiles <- lapply(
      smiles_aa_sequences,
      function(x) paste(gsub("O$", "", x), collapse = ""))

  concat_aa_smiles <- lapply(concat_aa_smiles, function(x) paste0(x, "O"))
  concat_aa_smiles <- unlist(concat_aa_smiles)

  return(concat_aa_smiles)

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants