Skip to content

OmicsML/awesome-molecule-protein-pretrain-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 

Repository files navigation

This repository is a collection of awesome papers about deep learning model for molecule and protein. If you have any suggestions (missing papers, issues, other resources), feel free to pull a request or email me at qiaolinlu99@gmail.com.

awesome-pretrain-molecule-protein-papers

Molecule

Survey

  1. [2022 Current Opinion in Structural Biology] Deep learning approaches for de novo drug design: An overview [paper]
  2. [2022 arXiv] Geometrically Equivariant Graph Neural Networks: A Survey [paper]
  3. [2022 arXiv] MolGenSurvey:A Systematic Survey in Machine Learning Models for Molecule Design [paper]
  4. [2023 Engineering] Machine Learning for Chemistry: Basics and Applications [paper]
  5. [2023 IJCAI] A Systematic Survey of Chemical Pre-trained Models [paper][code]
  6. [2023 Wiley] Generative models for molecular discovery: Recent advances and challenges [paper]
  7. [2024 arXiv] A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications [paper]

Molecule Pretrain Model / Representation Learning

  1. [2023 ICLR] Uni-Mol: A Universal 3D Molecular Representation Learning Framework [paper][code]
  2. [2023 ICLR] One Transformer Can Understand Both 2D & 3D Molecular Data [paper][code]
  3. [2023 KDD] Automated 3D Pre-Training for Molecular Property Prediction [paper][code]
  4. [2022 KDD] Unified 2D and 3D Pre-Training of Molecular Representations [paper][code]
  5. [2023 KDD] Dual-view Molecule Pre-training [paper][code]
  6. [2022 arXiv] CoSP: Co-supervised pretraining of pocket and ligand [paper]
  7. [2022 Nature Machine Intelligence] Molecular contrastive learning of representations via graph neural networks [paper][code]
  8. [2023 ICML] Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language [paper]
  9. [Briefings in Bioinformatics] MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction [paper][code]
  10. [2021 NIPS] Motif-based Graph Self-Supervised Learning for Molecular Property Prediction [paper][code]
  11. [2022 ICLR] Spherical Message Passing for 3D Molecular Graphs [paper][code]
  12. [2022 NIPS] ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs [paper][code]
  13. [2022 Nvidia] MegaMolBART [web-link][code]
  14. [2020 NIPS workshop] Message Passing Networks for Molecules with Tetrahedral Chirality [paper][code]
  15. [2023 arXiv] Augmenting large language models with chemistry tools [paper][code]
  16. [2023 ICML workshop] Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning [paper]

Molecule Generative Model

  1. [2021 NIPS] Geomol: Torsional geometric generation of molecular 3d conformer ensembles [paper][code]
  2. [2021 NIPS] Predicting molecular conformation via dynamic graph score matching
  3. [2021 ICML] GraphDF: A Discrete Flow Model for Molecular Graph Generation [paper][code]
  4. [2023 ICLR workshop] Improving Small Molecule Generation using Mutual Information Machine [paper]
  5. [2020 KDD] MoFlow: An Invertible Flow Model for Generating Molecular Graphs [paper][code]
  6. [2022 ICLR] Geodiff: A geometric diffusion model for molecular conformation generation [paper][code]
  7. [2022 ICML] Equivariant Diffusion for Molecule Generation in 3D [paper][code]
  8. [2022 NIPS] Torsional Diffusion for Molecular Conformer Generation [paper][code]
  9. [2022 ICLR] An Autoregressive Flow Model for 3D Molecular Geometry Generation from Scratch [paper][code]
  10. [2022 Research Square] Accurate Protein-Ligand Complex Structure Prediction using Geometric Deep Learning [paper][code]
  11. [2022 arXiv] BARTSmiles: Generative Masked Language Models for Molecular Representations [paper][code]
  12. [2023 ICLR] Diffdock: Diffusion steps, twists, and turns for molecular docking [paper][code]
  13. [2023 ICLR] De novo molecular generation via connection-aware motif mining [paper][code]
  14. [2023 ICML] MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation [paper][code]
  15. [2023 ICML] Geometric Latent Diffusion Models for 3D Molecule Generation [paper][code]
  16. [2023 ICLR] Conditional antibody design as 3d equivariant graph translation [paper][code]
  17. [2023 bioRxiv] Guided Diffusion for molecular generation with interaction prompt [paper]
  18. [2023 Nature Machine Intelligence] ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling [paper][code]
  19. [2023 Nature Computational Science] Learning on topological surface and geometric structure for 3D molecular generation [paper][code]
  20. [2023 arXiv] Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model [paper]
  21. [2023 Nature Computational Science] Guided diffusion for inverse molecular design [paper][code]
  22. [2023 JACS] Generative Models as an Emerging Paradigm in the Chemical Sciences [paper]
  23. [2023 arXiv] Generating Molecular Conformer Fields [paper]
  24. [2023 ICLR] Retrieval-based Controllable Molecule Generation [paper][code]
  25. [2024 ICLR] Unified Generative Modeling of 3D Molecules with Bayesian Flow Networks [paper]
  26. [2024 ICLR] Training-free Multi-objective Diffusion Model for 3D Molecule Generation [paper]
  27. [2024 Nature Machine Intelligence] Generation of 3D molecules in pockets via a language model [paper]

Molecule Protein Docking Model

  1. [2015 Bioinformatics] Fast, accurate, and reliable molecular docking with QuickVina 2 [paper]
  2. [2016 JCIM] Protein-Ligand Scoring with Convolutional Neural Networks [paper]
  3. [2017 Scientific Reports] Protein-Ligand Blind Docking Using QuickVina-W With Inter-Process Spatio-Temporal Integration [paper]
  4. [2018 Journal of Cheminformatics] P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure [paper][code]
  5. [2018 IJCAI] Interpretable drug target prediction using deep neural representation [paper]
  6. [2019 JCIM] Are the Apo Proteins Suitable for the Rational Discovery of Allosteric Drugs? [paper]
  7. [2020 Journal of Cheminformatics] Edock: blind protein–ligand docking by replica-exchange monte carlo simulation [paper][web-link]
  8. [2020 Journal of Cheminformatics] spyrmsd: symmetry-corrected RMSD calculations in Python [paper][code]
  9. [2021 Journal of Cheminformatics] GNINA 1.0: molecular docking with deep learning [paper][code]
  10. [2021 Briefings in Bioinformatics] InstaDock: A single-click graphical user interface for molecular docking-based virtual high-throughput screening [paper][web-link]
  11. [2021 Nature Machine Intelligence] A geometric deep learning approach to predict binding conformations of bioactive molecules [paper][code]
  12. [2022 ICML] EQUIBIND: Geometric Deep Learning for Drug Binding Structure Prediction [paper][code]
  13. [2022 NIPS] TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction [paper][code]
  14. [2022 Molecular Systems Biology] Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery [paper]
  15. [2022 arXiv] CoSP: Co-supervised pretraining of pocket and ligand [paper]
  16. [2023 ICLR] Diffdock: Diffusion steps, twists, and turns for molecular docking [paper][code]
  17. [2023 ICLR] E3Bind: An End-to-End Equivariant Network for Protein-Ligand Docking [paper]
  18. [2023 PNAS] Contrastive learning in protein language space predicts interactions between drugs and protein targets [paper][code]
  19. [2023 Nature Computer Science] Efficient and accurate large library ligand docking with KarmaDock [paper][code]
  20. [2023 JCIM] BMaps: A Web Application for Fragment-Based Drug Design andCompound Binding Evaluation [paper][web-link]
  21. [2023 JCTC] Equivariant Flexible Modeling of the Protein–Ligand Binding Pose with Geometric Deep Learning [paper][code]
  22. [2023 arXiv] Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning [paper]
  23. [2023 Openreview] The Discovery of Binding Modes Requires Rethinking Docking Generalization [paper]
  24. [2023 Openreview] Protein-Ligand Interaction Prior for Binding-aware 3D Molecule Diffusion Models [paper]
  25. [2023 arXiv] Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks [paper][code]

Tetrahedral Molecular Geometry

  1. [2020 NIPS workshop] Message Passing Networks for Molecules with Tetrahedral Chirality [paper][code]
  2. [2023 Nature Reviews Chemistry] Detection and analysis of chiral molecules as disease biomarkers [paper]

Chemical Reaction

  1. [2018 Nature] Planning chemical syntheses with deep neural networks and symbolic AI [paper][code-nonofficial]
  2. [2020 JCIM] Predicting Retrosynthetic Reaction using Self-Corrected Transformer Neural Networks [paper][code]
  3. [2020 Chemical Science] Automatic retrosynthetic route planning using template-free models [paper][code]
  4. [2023 Nature Communications] Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks [paper][code]
  5. [2023 Nature Machine Intelligence] Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model [paper][code]
  6. [2024-ICLR] RetroBridge: Modeling Retrosynthesis with Markov Bridges [paper]

Drug Discovery

  1. [2021 JCIM] OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design [paper]
  2. [2023 Nature Communications] Discovery of senolytics using machine learning [paper][code]
  3. [2023 ICML] Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions [paper][code]
  4. [2023 Current Opinion in Structural Biology] Structure-based drug design with geometric deep learning [paper]
  5. [2023 Openreview] Drug Discovery with Dynamic Goal-aware Fragments [paper]
  6. [2023 ICML] DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [paper][code]
  7. [2023 Openreview] Streamlining Generative Models for Structure-Based Drug Design [paper]
  8. [2023 Openreview] Long-Short-Range Message-Passing: A Fragmentation-Based Framework to Capture Non-Local Atomistic Interactions [paper]
  9. [2023 ACS Central Science] Geometric Deep Learning for Structure-Based Ligand Design [paper]
  10. [2023 ICLR] De Novo Molecular Generation via Connection-aware Motif Mining [paper][code]
  11. [2023 NIPS] Tartarus: A benchmarking platform for realistic and practical inverse molecular design [paper] [code]
  12. [2023 arXiv] ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback [paper][code]
  13. [2023 ACS Central Science] 2023-ACS Central Science / Geometric Deep Learning for Structure-Based Ligand Design [paper]
  14. [2023 Wiley] Generative models for molecular discovery: Recent advances and challenges [paper]

Chemistry Toolkits

  1. RDKit: Open-source cheminformatics software
  2. PyMOL: A user-sponsored molecular visualization system on an open-source foundation
  3. CDK: Chemistry Development Kit (Open Source modular Java libraries for Cheminformatics)
  4. Open Babel: The Open Source Chemistry Toolbox
  5. Cinfony: A common API to several cheminformatics toolkits
  6. Indigo: A universal molecular toolkit that can be used for molecular fingerprinting, substructure search, and molecular visualization
  7. ChemoPy: A freely available python package for computational biology and chemoinformatics
  8. ChemmineR: A cheminformatics package for analyzing drug-like small molecule data in R
  9. ChemKit: An open source software library for chemistry
  10. DeepChem: A Python library for machine learning and deep learning on molecular and quantum datasets
  11. MolVS: Molecular Validation and Standardization
  12. spyrmsd: spyrmsd: symmetry-corrected RMSD calculations in Python
  13. torchdrug: A machine learning platform designed for drug discovery
  14. torchprotein: A machine learning library for protein science, built on top of TorchDrug
  15. OpenChem: A deep learning toolkit for computational chemistry and drug design

Molecule Dataset

  1. [2012 JCIM] Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17 [paper]
  2. [2015 JCIM] ZINC 15 – Ligand Discovery for Everyone [paper]
  3. [2016 Nucleic Acids Research] PubChem Substance and Compound databases
  4. [2017 Science Advances] Machine learning of accurate energy-conserving molecular force fields [paper][data-link]
  5. [2018 Nucleic Acids Research] ChEMBL: towards direct deposition of bioassay data [paper]
  6. [2018 Nucleic Acids Research] DrugBank 5.0: a major update to the DrugBank database for 2018 [paper]
  7. [2018 JCIM] Comparative Assessment of Scoring Functions: The CASF-2016 Update [paper][web-link]
  8. [2019 JCIM] GuacaMol: Benchmarking Models for De Novo Molecular Design [paper][code][web-link]
  9. [2023 arXiv] Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets [paper]
Dataset Name #Molecules Property About Data
GDB-17 166B small organical molecules chemical universe database
QM9 133,885 quantum chemical properties a subset of GDB-17
ZINC15 230M(3D), over 750M bioactivity of small molecules biologically relevant small molecules
GuacaMol N/A benchmark suite for generative chemistry a subset of molecules extracted from ChEMBL 24
ChEMBL 2.4M bioactive molecules with drug-like properties contains chemical, bioactivity and genomic data
PubChem BioAssay 270,998,024 PubChem BioAssay a subset of PubChem
PubChem Compound 1,366,263 PubChem Compound a subset of PubChem
PubChem Substance 109,891,994 PubChem Substance a subset of PubChem
DrugBank 5.0 N/A drug data bioinformatics and cheminformatics database
GEOM-Drugs 430,00 0 drug data a larger scale dataset of molecular conformers than QM9

Materials

  1. [2023 Openreview] Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding [paper]
  2. [2023 Openreview] Scalable Diffusion for Materials Generation [paper]
  3. [2023 Openreview] MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design [paper]
  4. [2023 JACS] MOFormer: Self-Supervised Transformer Model for Metal–Organic Framework Property Prediction [paper][code]

Molecular Dynamics

  1. [2023 arXiv] Score dynamics: scaling molecular dynamics with picoseconds timestep via conditional diffusion model [paper][code]

Protein

Protein Pretrain Model / Representation Learning

  1. [2020 bioRxiv] Transformer protein language models are unsupervised structure learners [paper][code]
  2. [2021 PNAS] Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences [paper][code]
  3. [2021 ICML] MSA Transformer [paper][code]
  4. [2021 NIPS] Language models enable zero-shot prediction of the effects of mutations on protein function [paper][code]
  5. [2021 Nature] Highly accurate protein structure prediction with AlphaFold [paper][code]
  6. [2021 Science] Accurate prediction of protein structures and interactions using a three-track neural network [paper][code]
  7. [2022 ICML] Learning inverse folding from millions of predicted structures [paper][code]
  8. [2022 bioRxiv] OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [paper][code]
  9. [2021 PNAS] Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [paper][code]
  10. [2022 bioRxiv] Language models of protein sequences at the scale of evolution enable accurate structure prediction [paper]
  11. [2023 Science] Evolutionary-scale prediction of atomic-level protein structure with a language model [paper][code]
  12. [2022 Nature Communications] ProtGPT2 is a deep unsupervised language model for protein design [paper][code]
  13. [2023 bioRxiv] xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [paper]
  14. [2023 arXiv] Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins [paper][code]
  15. [2023 bioRxiv] Pretrainable Geometric Graph Neural Network for Antibody Affinity Maturation [paper]
  16. [2023 Openreview] Pre-training Sequence, Structure, and Surface Features for Comprehensive Protein Representation Learning [paper]
  17. [2023 ICLR] Protein Representation Learning by Geometric Structure Pretraining [paper][code]

Protein Evolution

  1. [2023 Nature] Clustering-predicted structures at the scale of the known protein universe [paper]

Protein Protein Docking Model

  1. [2020 Sturcture] Performance and its limits in rigid body protein-protein docking [paper][web-link]
  2. [2020 Nature Protocols] The HDOCK server for integrated protein–protein docking [paper][web-link]
  3. [2021 bioRxiv] Improved docking of protein models by a combination of alphafold2 and cluspro [paper]
  4. [2022 ICLR] Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking [paper][code]

Protein Generative Model

  1. [2023 ICLR] Learning Hierarchical Protein Representations via Complete 3D Graph Networks [paper][code]
  2. [2023 ICML] Se(3) diffusion model with application to protein backbone generation [paper][code]
  3. [2023 ICLR] Diffusion probabilistic modeling of protein backbones in 3d for the motif- scaffolding problem [paper][code]
  4. [2023 ICLR] Protein sequence and structure co-design with equivariant translation [paper]
  5. [2023 ICML] SE(3) diffusion model with application to protein backbone generation [paper][code]
  6. [2023 arXiv] DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing [paper]
  7. [2023 bioRxiv] Protein generation with evolutionary diffusion: sequence is all you need [paper]
  8. [2023 Openreview] SE(3)-Stochastic Flow Matching for Protein Backbone Generation [paper]
  9. [2023 Nature] De novo design of protein structure and function with RFdiffusion [paper][code]

Protein Dataset

3D Model

Equivariant Model

  1. [2013 Doc] Lie groups for 2d and 3d transformations [paper]
  2. [2018 arXiv] Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds [paper][code]
  3. [2019 NIPS] Cormorant: Covariant Molecular Neural Networks [paper][code]
  4. [2020 NIPS] SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks [paper][code]
  5. [2021 ICML] E(n) Equivariant Graph Neural Networks [paper][code]
  6. [2021 Nature Communications] E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials [paper][code]
  7. [2022 ICLR] Geometric and Physical Quantities Improve E(3) Equivariant Message Passing [paper]
  8. [2022 ICLR workshop] Denoising diffusion probabilistic models on so (3) for rotational alignment [paper]
  9. [2022 arXiv] e3nn: Euclidean Neural Networks [paper]
  10. [2023 ICLR] Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs [paper][code]
  11. [2024 ICLR] Hybrid Directional Graph Neural Network for Molecules [paper][code]