Skip to content

Latest commit

 

History

History
39 lines (38 loc) · 2.73 KB

guide.md

File metadata and controls

39 lines (38 loc) · 2.73 KB

Guide For Adding A New KEGG Pathway.

  1. Log in to KEGG, select the pathway and download as in KGML format.
  2. Clone this Github repository: https://github.com/arnaudporet/kgml2sif. This is a KGML to simple interaction format tool.
  3. Run the tool with the following parameters: python kgml2sif.py -g conv/gene2symbol.tsv hsa00000.xml here we assume hsa00000.xml is the pathway name.
  4. The result is a table with intraction between proteins and other pathway compounds. You can select which ones you need for your pathway.
  5. In Pathway database each line represents a part of an interaction, either an input or an output. The input or the output can be a protein, a complex of a few proteins or a compound.
  6. Using Excel you need to provide the following table of interactions for this pathway:
    1. column 1: pathway name.
    2. column 2: pathway ID.
    3. column 3: molecule type (i.e protein, compound). compound will always get probabilty 1 in calculations.
    4. column 4: molecule name. Used only in graphics. Must be repeated in column 6.
    5. column 5: a unique molecule number.
    6. column 6: a comma-separated list of molecules names that are involved in this interaction. That is a bit different than the default KGML format. For example if we have:
      1. KRAS activation ARAF
      2. KRAS activation BRAF
      3. KRAS activation PIK3CA
      4. KRAS activation PIK3CB
      5. KRAS activation PIK3CD
      6. KRAS activation PIK3R1
      7. KRAS activation PIK3R2 We need to group the molecules to one list like the following.
      8. path_name \t path_ID \t mol_type \t KRAS \t mol_num \t
      9. path_name \t path_ID \t mol_type \t ARAF \t mol_num \t BRAF, PIK3CA, PIK3CB, PIK3CD, PIK3R1, PIK3R2
      10. PathWeigh will take the maximume value of this group in calculation. We can have few such groups for an interaction and PathWeigh will take their product.
    7. column 7: optional: you can add 'active' in case of an active molecule. This will add a '+' sign in the KGML export parser to signal an active molecule.
    8. column 8: add the source database, in this case, 'KEGG'.
    9. column 9: ignored.
    10. column 10: The molecule role in column 4. Can be: input, output or inhibitor (only in case it negatively affects the interaction).
    11. column 11: unique interaction ID.
    12. column 12: interaction type. Can be any detailed string, for example modification, degradation, translocation. Used only in graphics.
  7. After the table is prepared it needs to be renamed to pathologist.db.txt file in the data folder. Don't concatenate to the old database file, use them separately.
  8. An example of a file in this format is available in

    pathologist.db.txt