ABRF_iPRG_2015_SpC

Spectral Counting (SpC) Analysis with R and edgeR

Phil Wilmarth, OHSU

October 1, 2018

The data is from the ABRF iPRG 2015 study and is described in this publication:

Choi, M., Eren-Dogu, Z.F., Colangelo, C., Cottrell, J., Hoopmann, M.R., Kapp, E.A., Kim, S., Lam, H., Neubert, T.A., Palmblad, M. and Phinney, B.S., 2017. ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC–MS/MS Experiments. Journal of proteome research, 16(2), pp.945-957.

Abstract: Detection of differentially abundant proteins in label-free quantitative shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments requires a series of computational steps that identify and quantify LC-MS features. It also requires statistical analyses that distinguish systematic changes in abundance between conditions from artifacts of biological and technical variation. The 2015 study of the Proteome Informatics Research Group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF) aimed to evaluate the effects of the statistical analysis on the accuracy of the results. The study used LC-tandem mass spectra acquired from a controlled mixture, and made the data available to anonymous volunteer participants. The participants used methods of their choice to detect differentially abundant proteins, estimate the associated fold changes, and characterize the uncertainty of the results. The study found that multiple strategies (including the use of spectral counts versus peak intensities, and various software tools) could lead to accurate results, and that the performance was primarily determined by the analysts' expertise. This manuscript summarizes the outcome of the study, and provides representative examples of good computational and statistical practice. The data set generated as part of this study is publicly available.

Six proteins were prepared in 4 different abundance mixes and spiked into a yeast cell lysate background. Each of the 4 different spike-in experiments were analyzed in triplicate on a Q-Exactive instrument. Each sample was analyzed using a single 2-hour LC run.

The RAW files were downloaded and analyzed with Comet and the PAW pipeline. Although the study was really designed for feature intensity analysis using Skyline, spectral counting was done by a few groups in the study cohort. Spectral counting was also done here to demonstrate some considerations needed for that type of data. Differential expression statistical analysis was done in R using edgeR.

Successful proteomics data analyses often need more than good training, domain knowledge, and intelligence. They need common sense and a flair for the practical. Preparing the data for analysis from the (all too often) horrendous summary files takes skill and practice. There may be data that needs to excluded on the basis of measurement limitations. This is conceptually very different from excluding a specific sample. The analysis presented here touches on some of these aspects, which can also have applicability in other types of proteomics studies.

A direct link to the rendered notebook is here

File Key:

ABRF_2015_edgeR.ipynb - main Jupyter notebook
ABRF_2015_grouped_protein_summary_8.xlsx - spreadsheet of grouped protein results
ABRF_2015_pipeline.log - consolated log file of all PAW steps
JD_sample1-A_peptide_results_8.txt - first of 12 detailed PSM files
...
JD_sample4_C_peptide_results_8.txt - last of 12 detailed PSM files
PAW_grouped_proteins_with_stats.txt - output file from notebook
PAW_protein_grouper.log - log file from protein grouping step
PAW_results.log - log file from protein inference step
README.md - this file
extras_iPRG2015_both.fasta - FASTA database used in Comet searches
grouped_peptide_summary_8.txt - peptide summary for grouped proteins
grouped_protein_summary_8.txt - protein summary after protein grouping
peptide_summary_8.txt - peptide summary for inferred proteins
protein_summary_8.txt - list of inferred proteins (group members are explicit)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
ABRF_2015_edgeR.html		ABRF_2015_edgeR.html
ABRF_2015_edgeR.ipynb		ABRF_2015_edgeR.ipynb
ABRF_2015_edgeR.r		ABRF_2015_edgeR.r
ABRF_2015_grouped_protein_summary_8.xlsx		ABRF_2015_grouped_protein_summary_8.xlsx
JD_sample1-A_peptide_results_8.txt		JD_sample1-A_peptide_results_8.txt
JD_sample1_B_peptide_results_8.txt		JD_sample1_B_peptide_results_8.txt
JD_sample1_C_peptide_results_8.txt		JD_sample1_C_peptide_results_8.txt
JD_sample2_A_peptide_results_8.txt		JD_sample2_A_peptide_results_8.txt
JD_sample2_B_peptide_results_8.txt		JD_sample2_B_peptide_results_8.txt
JD_sample2_C_peptide_results_8.txt		JD_sample2_C_peptide_results_8.txt
JD_sample3_A_peptide_results_8.txt		JD_sample3_A_peptide_results_8.txt
JD_sample3_B_peptide_results_8.txt		JD_sample3_B_peptide_results_8.txt
JD_sample3_C_peptide_results_8.txt		JD_sample3_C_peptide_results_8.txt
JD_sample4-A_peptide_results_8.txt		JD_sample4-A_peptide_results_8.txt
JD_sample4_B_peptide_results_8.txt		JD_sample4_B_peptide_results_8.txt
JD_sample4_C_peptide_results_8.txt		JD_sample4_C_peptide_results_8.txt
PAW_grouped_proteins_with_stats.txt		PAW_grouped_proteins_with_stats.txt
README.md		README.md
extras_iPRG2015_both.fasta		extras_iPRG2015_both.fasta
grouped_peptide_summary_8.txt		grouped_peptide_summary_8.txt
grouped_protein_summary_8.txt		grouped_protein_summary_8.txt
peptide_summary_8.txt		peptide_summary_8.txt
protein_summary_8.txt		protein_summary_8.txt

pwilmart/ABRF_iPRG_2015_SpC

Folders and files

Latest commit

History

Repository files navigation

ABRF_iPRG_2015_SpC

Spectral Counting (SpC) Analysis with R and edgeR

Phil Wilmarth, OHSU

October 1, 2018

About

Topics

Resources

Stars

Watchers

Forks

Languages