Skip to content
/ MSiP Public

Computational approach to predict interactome from Large Scale Affinity Purification Mass Spectrometry Datasets

License

Notifications You must be signed in to change notification settings

mrbakhsh/MSiP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRAN Version Downloads from the RStudio CRAN mirror License: GPL (>= 3)

Mass Spectrometry interaction Prediction (MSiP)

MSiP is a computational approach to predict protein-protein interactions (PPIs) from large scale affinity purification mass spectrometry (AP-MS) data. This approach includes both spoke and matrix models for interpreting AP-MS data in a network context. The 'spoke' model considers only bait-prey interactions, whereas the 'matrix' model assumes that each of the identified proteins (baits and prey) in a given AP-MS experiment interacts with each of the others. The spoke model has a high false-negative rate, whereas the matrix model has a high false-positive rate. Although, both statistical models have merits, a combination of both models has shown to increase the performance of machine learning classifiers in terms of their capabilities in discrimination between true and false positive interactions Drew et al., 2017.

Installation from cran:

install.packages('MSiP')
library(MSiP)

To install the development version in R, run:

if(!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools") 
}
devtools::install_github("mrbakhsh/MSiP")
library(MSiP)

Sample Data Description:

A demo AP-MS proteomics dataset is provided in this package to guide the users about data structure.

data("SampleDatInput")
head(SampleDatInput)

Scoring based on "spoke-model":

Comparative Proteomic Analysis Software Suite (CompPASS) is a robust statistical scoring scheme for assigning confidence scores to bait-prey interactions Sowa et al., 2009. The output from CompPASS scoring includes Z-score, S-score, D-score, WD-score and other features.

datScoring <- 
    cPASS(SampleDatInput)

Scoring based on "matrix-model":

The Dice coefficient was first applied by Zhang et al., 2008 to score interaction between all identified proteins (baits and preys) in a given AP-MS expriment.

datScoring <- 
    diceCoefficient(SampleDatInput)

Alternatively, Jaccard, Simpson, and Overlap scores can be used to score the interaction between all the identified proteins in a given AP-MS experiment.

#Jaccard coefficient
datScoring <- 
    jaccardCoefficient(SampleDatInput)

#Simpson coefficient
datScoring <- 
    simpsonCoefficient(SampleDatInput)

#Overlap score
datScoring <- 
    overlapCoefficient(SampleDatInput)

Finally, a weighted matrix model Drew et al., 2017 can also be employed to score interactions between identified proteins in a given AP-MS experiment. The output of the weighted matrix model includes the number of experiments for which the pair of proteins is co-purified (i.e., k) and $-1$*log(P-value) of the hypergeometric test (i.e., logHG) given the experimental overlap value, each protein's total number of observed experiments, and the total number of experiments.

datScoring <- 
Weighted.matrixModel(SampleDatInput)

Assign a confidence score to each instances using classifiers:

The labeled feature matrix can be used as input for Support Vector Machine (SVM) or Random Forest (RF) classifiers. The classifier then assigns each bait-prey pair a confidence score, indicating the level of support for that pair of proteins to interact. Hyperparameter optimization can also be performed to select a set of parameters that maximizes the model's performance. The RF and the SVM functions provided in this package also computes the areas under the precision-recall (PR) and ROC curve to evalute the performance of the classifier.

Import the demo data:

data("testdfClassifier")
head(testdfClassifier)

Run the RF classifier:

#only generate the pr.curve
predidcted_RF <- 
    rfTrain(testdfClassifier,impute = FALSE, p = 0.3, parameterTuning = FALSE,
        mtry  = seq(from = 1, to = 5, by = 1),
        min_node_size = seq(from = 1, to = 5, by = 1),
        splitrule =c("gini"),metric = "Accuracy",
        resampling.method = "repeatedcv",iter = 5,repeats = 5,
        pr.plot = TRUE, roc.plot = FALSE
    )

Run the SVM classifier:

#only generate the ROC curve
predidcted_SVM <- 
    svmTrain(testdfClassifier,impute = FALSE,p = 0.3,parameterTuning = TRUE,
        cost = seq(from = 2, to = 10, by = 2),
        gamma = seq(from = 0.01, to = 0.10, by = 0.02),
        kernel = "radial",ncross = 10,
        pr.plot = FALSE, roc.plot = TRUE
    ) 

About

Computational approach to predict interactome from Large Scale Affinity Purification Mass Spectrometry Datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages