Skip to content

Latest commit

 

History

History
124 lines (70 loc) · 11.1 KB

ch01_introduction.asciidoc

File metadata and controls

124 lines (70 loc) · 11.1 KB

Chapter 1: Introduction

Chemoinformatics is a methodology that is used to analyze mainly chemical-related data using a computer and solve various problems. The term chemoinformatics was defined in the late 1990s and early 2000s, and in the pharmaceutical industry and pharmaceutical academia, the relationship between drug effects and compound characteristics is analyzed, large amounts of compound information are visualized, and compound similarity It is used in a wide variety of processes, including gender-based clustering.

In recent years, drug discovery applications for deep learning have been explored, but not only in conventional chemoinformatics such as new design proposals and synthetic route proposals, as well as QSAR (Quantitative Structure-Activity Relationship) for predicting activity and physical properties. Applied research to areas that were not being conducted is also actively conducted.

Compound design is innovative

What kind of compound should we make in the first place? And how to synthesize it? The process of thinking about the background is an area where background knowledge and imagination are required, and conventionally it has been recognized that it is a difficult area for people other than to bear, but the advancement of what is also called AI to such areas is here It progressed rapidly in several years (2017-2019).

Cheminformatics has already been used in various situations, but there is not much related information. There are several possible reasons for this, but there is no doubt that the two main reasons are the lack of open source toolkits and the unavailability of public databases. However, with the advent of RDKit, an open source chemoinformatics toolkit and a public database called ChEMBL, this has been somewhat resolved.

In recent years, similar to bioinformatics, a lot of information can be quickly obtained about chemoinformatics by searching on the web, and it is possible to learn by yourself. Nevertheless, the ammount of information required to take a first step can be overwhelming and so we decided to prepare "the content that could learn the basics of chemoinformatics and apply them". Considering the recent AI drug discovery boom, the latter chapter contains chapters on compound activity prediction and compound proposal using deep learning used in the context of “AI drug discovery”, so learn it as a whole and you should be able to keep up with recent trends.

What is RDKit

warning

Here is a subsection of @ iwatobipen’s talk about RDKit. At the draft stage, the words such as "I will say" or "based on" are used as they are, and the self-proclaimed is a "comprehensive" @ iwatobipen-style style of "gozuru" tone.

My name is @iwatobipen, who writes a part of this book. I’m going to talk hot about RDKit here.

What is the RD of RDKit? Actually, it is an abbreviation of Rational Discovery . Its non-open source predecessor was developed in 2000, so it is very old. Then, in 2006, the code became open source and was released from sourceforge. Some readers think of OpenBabel in addition to RDKit as a chemoinformatics toolkit for Python. OpenBabel was first released in 2005. Both are toolkits that have been around for over 10 years. When I started to get interested in this area in 2012, I remember that OpenBabel was bigger. At that time, there were almost no articles in Japanese, and the person who wrote this book was a trial and error writing the code of RDKit referring to the chemo info cookbook of @fmkz___ who is a co-author of this book and a pioneer in the industry Oh. If you want to check chemoinfo related history, you should read this article.

Oops, the story has diverted. Let’s return to the main subject.

Developer Greg Landorum says

RDKit is the Swiss Army Knife in chemoinformatics, a collection of various functional pieces
— Greg Landorum

This is precisely the point. As you can see if you look at the official documentation, it already has various features. Starting with reading and writing of compound information, drawing of structure, 3D structure conformation generation, R group decomposition, descriptors, fingerprint calculations, pharmacophore calculations etc. It can cover a wide range of features from analysis to visualization. Furthermore, the tools developed by Contributers and others using RDKit are packed in the Contrib folder. How do you want to use it? Now I want to write code with RDKit as soon as possible, I cant’t wait ;)

Note
@iwatobipen is, of course, one of the contributors, and provides code to quickly cluster a large number of compound libraries called Fastcluster . (by @fmkz___)

RDKit is also active in the development and user community, with more features being added. The style in which talented researchers from all over the world build up and develop as a whole is the strength and attraction of open source. If you have a chance, consider joining the annual RDKit User Group Meeting. It is hard to replace anything with Face2Face that users can discuss each other. Also, I said that when I started using it, there was almost no information on Japanese, but in recent years there have been a lot of very good Japanese articles. Some examples are given below. There are many articles on Qiita.

In addition, RDKit-users-jp has also been launched by volunteers. If your question in English seems to be a bit …​, I would like to ask a question here. Also, Japanese documents are merged into the latest version of RDKit’s repository. This will also be helpful. This document only uses some of RDKit’s features. You should still feel that you can do a lot of things. Once you have taken the first step of interest, you should go ahead with your own interest and motivation. If you do not understand something, ask the above community and post it to the repository of this book as an issue. Well then let’s get started!

Target audience

The following people are assumed as readers.

  • Graduate students in medical and pharmaceutical sciences and postdocs who want to analyze data in pharmaceutical sciences

  • Pharmaceutical researchers at pharmaceutical companies who want to analyze their own data

  • Drug chemist who feels the need for chemoinformatics

  • Bioinformatics thinking about learning chemoinformatics

  • People who are interested in AI drug discovery but do not know what to start

About the code of this book

All of the programming code used in this book is located in the notebooks directory of the py4cheminformatics repository of Mishima.syk. The first one of each of the jupyter chapter please see properly because it stretched a link to the chapter of Jupyter notebook to.

The installation of Chapter 2 will enable you to use git commands, so you can download all the data in this manual including pdf with the following command

$ git clone https://github.com/joofio/py4chemoinformatics.git

bonus

Chemoinformatics or Cheminformatics?

Chemoinformatics or Cheminformatics? Originally I remember that Bio and the combination of the word “Chemo” appeared, but it was widely separated from Chem for a while by the launch of the Journal of Cheminformatics.

According to the recent Google trend, it seems either way, but personally I think that it is better to put emphasis on Rhyme, so I will use Chemo in this book.

Acknowledgment

We would like to thank the following people for their bug fixes and suggestions for improvement when writing this document:

From here on, I wrote it while listening to Nujabes-reflection eternal by @fmkz___ 03/03/2019

First of all, I would like to thank the @bonohu for starting this book. @Bonohu’s Dr. Bono’s analysis of life science data. At he meeting of Mishima.syk we talked that "The Bono book Chemoinformatics version" would be nice. There is no doubt that what triggered me to write this book is, "Well, if yes, why not write?" Also, link: @souyakuchan Drug Advent Calendar 2018, written in Japanese has also become a good stimulus for writing. In other words, I think that I did not start to move specifically if I did not make a chapter here.

Also, don’t forget the existence of y-sama. Y-sama who has been excited about Mishima.syk from the beginning died on January 6, 2019. He wrote wonderful post such as Python environment construction of the person who aims at the data scientist 2016 and Small talk about drug likeness: written in Japanese. If he was alive, we would probably write by three people and the content would have been more complete. This event also gave us a strong motivation to write.

Finally, I would like to thank the participants who participated in Mishima.syk for drinking good wine and beer and having a hot discussion every time. Some content is based on the presentation at Mishima.syk, and has been revised based on your feedback.

If you have read this book, and if you feel that chemoinformatics is interesting or you want to do drug discovery, please join Mishima.syk. I think it will be fun. In future drug discovery research, it will be important to push each other across affiliations and improve their skills. In fact, I think it is already such a society. I hope this book will help you have a pleasant research life.

I have done what I want to do and I have no regrets in my life I enjoyed life but I won’t hate what I do and it’s better to pursue my joy as much as possible and enjoy life I hope it’s fun
— y__sama

License

This document is copyright © 2019 by @fmkz___ and @iwatobipen

CC-BY-NC-SA