Mass Spectrometry interaction Prediction (MSiP)

MSiP is a computational approach to predict protein-protein interactions (PPIs) from large scale affinity purification mass spectrometry (AP-MS) data. This approach includes both spoke and matrix models for interpreting AP-MS data in a network context. The 'spoke' model considers only bait-prey interactions, whereas the 'matrix' model assumes that each of the identified proteins (baits and prey) in a given AP-MS experiment interacts with each of the others. The spoke model has a high false-negative rate, whereas the matrix model has a high false-positive rate. Although, both statistical models have merits, a combination of both models has shown to increase the performance of machine learning classifiers in terms of their capabilities in discrimination between true and false positive interactions Drew et al., 2017.

Installation from cran:

install.packages('MSiP')
library(MSiP)

To install the development version in `R`, run:

if(!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools") 
}
devtools::install_github("mrbakhsh/MSiP")
library(MSiP)

Sample Data Description:

A demo AP-MS proteomics dataset is provided in this package to guide the users about data structure.

data("SampleDatInput")
head(SampleDatInput)

Scoring based on "spoke-model":

Comparative Proteomic Analysis Software Suite (CompPASS) is a robust statistical scoring scheme for assigning confidence scores to bait-prey interactions Sowa et al., 2009. The output from CompPASS scoring includes Z-score, S-score, D-score, WD-score and other features.

datScoring <- 
    cPASS(SampleDatInput)

Scoring based on "matrix-model":

The Dice coefficient was first applied by Zhang et al., 2008 to score interaction between all identified proteins (baits and preys) in a given AP-MS expriment.

datScoring <- 
    diceCoefficient(SampleDatInput)

Alternatively, Jaccard, Simpson, and Overlap scores can be used to score the interaction between all the identified proteins in a given AP-MS experiment.

#Jaccard coefficient
datScoring <- 
    jaccardCoefficient(SampleDatInput)

#Simpson coefficient
datScoring <- 
    simpsonCoefficient(SampleDatInput)

#Overlap score
datScoring <- 
    overlapCoefficient(SampleDatInput)

Finally, a weighted matrix model Drew et al., 2017 can also be employed to score interactions between identified proteins in a given AP-MS experiment. The output of the weighted matrix model includes the number of experiments for which the pair of proteins is co-purified (i.e., k) and $-1$*log(P-value) of the hypergeometric test (i.e., logHG) given the experimental overlap value, each protein's total number of observed experiments, and the total number of experiments.

datScoring <- 
Weighted.matrixModel(SampleDatInput)

Assign a confidence score to each instances using classifiers:

The labeled feature matrix can be used as input for Support Vector Machine (SVM) or Random Forest (RF) classifiers. The classifier then assigns each bait-prey pair a confidence score, indicating the level of support for that pair of proteins to interact. Hyperparameter optimization can also be performed to select a set of parameters that maximizes the model's performance. The RF and the SVM functions provided in this package also computes the areas under the precision-recall (PR) and ROC curve to evalute the performance of the classifier.

Import the demo data:

data("testdfClassifier")
head(testdfClassifier)

Run the RF classifier:

#only generate the pr.curve
predidcted_RF <- 
    rfTrain(testdfClassifier,impute = FALSE, p = 0.3, parameterTuning = FALSE,
        mtry  = seq(from = 1, to = 5, by = 1),
        min_node_size = seq(from = 1, to = 5, by = 1),
        splitrule =c("gini"),metric = "Accuracy",
        resampling.method = "repeatedcv",iter = 5,repeats = 5,
        pr.plot = TRUE, roc.plot = FALSE
    )

Run the SVM classifier:

#only generate the ROC curve
predidcted_SVM <- 
    svmTrain(testdfClassifier,impute = FALSE,p = 0.3,parameterTuning = TRUE,
        cost = seq(from = 2, to = 10, by = 2),
        gamma = seq(from = 0.01, to = 0.10, by = 0.02),
        kernel = "radial",ncross = 10,
        pr.plot = FALSE, roc.plot = TRUE
    )

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
R		R
data		data
man		man
vignettes		vignettes
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
MSiP.Rproj		MSiP.Rproj
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

data

data

man

man

vignettes

vignettes

DESCRIPTION

DESCRIPTION

LICENSE

LICENSE

MSiP.Rproj

MSiP.Rproj

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

Mass Spectrometry interaction Prediction (MSiP)

Installation from cran:

To install the development version in `R`, run:

Sample Data Description:

Scoring based on "spoke-model":

Scoring based on "matrix-model":

Assign a confidence score to each instances using classifiers:

Import the demo data:

Run the RF classifier:

Run the SVM classifier:

About

Releases

Packages

Languages

License

mrbakhsh/MSiP

Folders and files

Latest commit

History

Repository files navigation

Mass Spectrometry interaction Prediction (MSiP)

Installation from cran:

To install the development version in R, run:

Sample Data Description:

Scoring based on "spoke-model":

Scoring based on "matrix-model":

Assign a confidence score to each instances using classifiers:

Import the demo data:

Run the RF classifier:

Run the SVM classifier:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

To install the development version in `R`, run: