- Log in to KEGG, select the pathway and download as in KGML format.
- Clone this Github repository: https://github.com/arnaudporet/kgml2sif. This is a KGML to simple interaction format tool.
- Run the tool with the following parameters: python kgml2sif.py -g conv/gene2symbol.tsv hsa00000.xml here we assume hsa00000.xml is the pathway name.
- The result is a table with intraction between proteins and other pathway compounds. You can select which ones you need for your pathway.
- In Pathway database each line represents a part of an interaction, either an input or an output. The input or the output can be a protein, a complex of a few proteins or a compound.
- Using Excel you need to provide the following table of interactions for this pathway:
- column 1: pathway name.
- column 2: pathway ID.
- column 3: molecule type (i.e protein, compound). compound will always get probabilty 1 in calculations.
- column 4: molecule name. Used only in graphics. Must be repeated in column 6.
- column 5: a unique molecule number.
- column 6: a comma-separated list of molecules names that are involved in this interaction. That is a bit different than the default KGML format. For example if we have:
- KRAS activation ARAF
- KRAS activation BRAF
- KRAS activation PIK3CA
- KRAS activation PIK3CB
- KRAS activation PIK3CD
- KRAS activation PIK3R1
- KRAS activation PIK3R2 We need to group the molecules to one list like the following.
- path_name \t path_ID \t mol_type \t KRAS \t mol_num \t
- path_name \t path_ID \t mol_type \t ARAF \t mol_num \t BRAF, PIK3CA, PIK3CB, PIK3CD, PIK3R1, PIK3R2
- PathWeigh will take the maximume value of this group in calculation. We can have few such groups for an interaction and PathWeigh will take their product.
- column 7: optional: you can add 'active' in case of an active molecule. This will add a '+' sign in the KGML export parser to signal an active molecule.
- column 8: add the source database, in this case, 'KEGG'.
- column 9: ignored.
- column 10: The molecule role in column 4. Can be: input, output or inhibitor (only in case it negatively affects the interaction).
- column 11: unique interaction ID.
- column 12: interaction type. Can be any detailed string, for example modification, degradation, translocation. Used only in graphics.
- After the table is prepared it needs to be renamed to pathologist.db.txt file in the data folder. Don't concatenate to the old database file, use them separately.
- An example of a file in this format is available in