Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Regarding MPNN Drug Encoder #163

Open
SudhirGhandikota opened this issue May 30, 2023 · 3 comments
Open

Query Regarding MPNN Drug Encoder #163

SudhirGhandikota opened this issue May 30, 2023 · 3 comments

Comments

@SudhirGhandikota
Copy link

Hello,

Before I ask my query, I would like to congratulate everybody involved in building this great framework.
I was trying to train an MPNN-CNN model on the latest BindingDB data and while exploring the existing MPNN Drug encoder implementation, I ran into a couple of doubts/queries. I was hoping to get some clarification from you guys.

From what I understand, the agraph property associated with a drug is basically an adjacency list where the row_indexrepresents an atom and each column value is the in_bond number/index. I deduced this from the following code snippet in the smiles2mpnnfeature method.

for a in range(n_atoms):
            for i,b in enumerate(in_bonds[a]):
                agraph[a,i] = b

Then, during the MPNN model training, I can see that the agraphs associated with all drugs in a given batch are combined into one combined agraph_1st variable. In this consolidation step, you used two variables N_a and N_b to maintain the running sum of atoms and bonds from each drug. Also, to re-index this combined adjacency list, I can see that you are adding the N_a value at each step (code snippet below).

for i in range(N_atoms_bond.shape[0]):
            ....
            agraph_lst.append(agraph[i,:atom_num,:] + N_a)

However, since the values in these adjacency lists are bond numbers or indices, shouldn't we be adding the N_b variable value instead? Could you please clarify this doubt of mine?

Thanks in advance!

@futianfan
Copy link
Collaborator

agraph_lst save the index of atom. N_a is the number of total atoms before the current molecule. so we need to add the value. it is not related to the bond number.

@SudhirGhandikota
Copy link
Author

Hello,

Thanks for your response. However, the agraph_1st variable is populated using the agraph value, which seems to contain the bond number right (based on the first code snippet)??

@futianfan
Copy link
Collaborator

not bond number. but atom number. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants