Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROC data when parsing pepXML #7

Open
Owen-Duncan opened this issue Oct 25, 2017 · 3 comments
Open

ROC data when parsing pepXML #7

Owen-Duncan opened this issue Oct 25, 2017 · 3 comments
Labels
wiki Informative questions with answers that might help with lib usage

Comments

@Owen-Duncan
Copy link

Hi, msftbx has been great, I've started using it extensively in an analysis pipeline. When parsing pepXML i'd like to retrieve the roc_data_point entries to determine FDRs at given probabilities. When i parse pepXML to an msmsPipelineAnalysis type the roc data doesn't seem to be present, though RocErrorData types are in the library. Using interprophet analysis on TPP 5.0.

@chhh
Copy link
Owner

chhh commented Oct 26, 2017

@Owen-Duncan I've looked into this, and here's what I've found.
PepXml schema doesn't specify where elements such as peptideprophet_summary should go, i.e. inside which elements they can be contained. However, it does provide a description of what peptideprophet_summary is, that's why you see RocData... and friends in MSFTBX.

What this means, is that there's no way for the automatic parser to know where to expect peptideprophet_summary, so it just never parses it by itself. BUT, you can still point a parser manually to the block of xml and parse it, I'm providing a code snippet below that will print all ROC info from a file.

// prepare the input stream
final XMLStreamReader xsr = JaxbUtils.createXmlStreamReader(p, false);
// advance the input stream to the beginning of <peptideprophet_summary>
final boolean foundPepProphSummary = XmlUtils.advanceReaderToNext(xsr, "peptideprophet_summary");
if (!foundPepProphSummary)
    throw new IllegalStateException("Could not advance the reader to the beginning of a peptideprophet_summary tag.");

// unmarshal
final PeptideprophetSummary ps = JaxbUtils.unmarshal(PeptideprophetSummary.class, xsr);

Make sure you're using MSFTBX v1.6.1 (it's on Maven Central now), there were a few fixes introduced.

I know this is waaay suboptimal, but I never noticed the issue as nobody ever needed to access that portion of the file. Too bad that the pepxml xsd schema is flawed. Here's a complete example:

public static void main(String[] args) throws Exception {

        // input file
        String pathIn = args[0];
        Path p = Paths.get(pathIn).toAbsolutePath();
        if (!Files.exists(p))
            throw new IllegalArgumentException("File doesn't exist: " + p.toString());

        //////////////////////////////////
        //
        //      Relevant part start
        //
        //////////////////////////////////

        // prepare the input stream
        final XMLStreamReader xsr = JaxbUtils.createXmlStreamReader(p, false);
        // advance the input stream to the beginning of <peptideprophet_summary>
        final boolean foundPepProphSummary = XmlUtils.advanceReaderToNext(xsr, "peptideprophet_summary");
        if (!foundPepProphSummary)
            throw new IllegalStateException("Could not advance the reader to the beginning of a peptideprophet_summary tag.");

        // unmarshal
        final PeptideprophetSummary ps = JaxbUtils.unmarshal(PeptideprophetSummary.class, xsr);

        //////////////////////////////////
        //
        //      Relevant part end
        //
        //////////////////////////////////

        // use the unmarshalled object
        StringBuilder sb = new StringBuilder();
        sb.append("Input files:");
        for (InputFileType inputFile : ps.getInputfile()) {
            sb.append("\n\t").append(inputFile.getName());
            if (!StringUtils.isNullOrWhitespace(inputFile.getDirectory()))
                sb.append(" @ ").append(inputFile.getDirectory());
        }
        for (RocErrorDataType rocErrorData : ps.getRocErrorData()) {
            sb.append("\n");
            sb.append(String.format("ROC Error data (charge '%s'): \n", rocErrorData.getCharge()));
            // roc_data_points
            for (RocDataPoint rocDataPoint : rocErrorData.getRocDataPoint()) {
                sb.append(String.format("ROC min_prob=\"%.3f\" sensitivity=\"%.3f\" error=\"%.3f\" " +
                                "num_corr=\"%d\" num_incorr=\"%d\"\n",
                        rocDataPoint.getMinProb(), rocDataPoint.getSensitivity(), rocDataPoint.getError(),
                        rocDataPoint.getNumCorr(), rocDataPoint.getNumIncorr()));
            }
            // error_points
            for (ErrorPoint errroPoint : rocErrorData.getErrorPoint()) {
                sb.append(String.format("ERR error=\"%.3f\" min_prob=\"%.3f\" num_corr=\"%d\" num_incorr=\"%d\"\n",
                        errroPoint.getError(), errroPoint.getMinProb(), errroPoint.getNumCorr(), errroPoint.getNumIncorr()));
            }
        }

        System.out.println(sb.toString());
    }

@Owen-Duncan
Copy link
Author

Owen-Duncan commented Oct 31, 2017

Thank you! that worked perfectly.

for anyone following i needed to make two modifications to the code;

XmlUtils.advanceReaderToNextRunSummary

and

JaxbUtils.unmarshall

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import javax.xml.stream.XMLStreamReader;


public class JAXBPEPXMLFDR {
    public static void main(String[] args) throws Exception{
        // input file
        String pathIn = args[0];
        Path p = Paths.get(pathIn).toAbsolutePath();
        if (!Files.exists(p))
            throw new IllegalArgumentException("File doesn't exist: " + p.toString());
        //prepare input stream
        final XMLStreamReader xsr = JaxbUtils.createXmlStreamReader(p, false);
        //advance reader to begining of <roc_error_data>
        final boolean foundPepProphSummary = XmlUtils.advanceReaderToNextRunSummary(xsr, "interprophet_summary");
        final InterprophetSummary ps = JaxbUtils.unmarshall(InterprophetSummary.class, xsr);
        // use the unmarshalled object
        StringBuilder sb = new StringBuilder();
        sb.append("Input files:");
        for (InputFileType inputFile : ps.getInputfile()) {
            sb.append("\n\t").append(inputFile.getName());
            if (!StringUtils.isNullOrWhitespace(inputFile.getDirectory()))
                sb.append(" @ ").append(inputFile.getDirectory());
        }
        for (RocErrorDataType rocErrorData : ps.getRocErrorData()) {
            sb.append("\n");
            sb.append(String.format("ROC Error data (charge '%s'): \n", rocErrorData.getCharge()));
            // roc_data_points
            for (RocDataPoint rocDataPoint : rocErrorData.getRocDataPoint()) {
                sb.append(String.format("ROC min_prob=\"%.3f\" sensitivity=\"%.3f\" error=\"%.3f\" " +
                                "num_corr=\"%d\" num_incorr=\"%d\"\n",
                        rocDataPoint.getMinProb(), rocDataPoint.getSensitivity(), rocDataPoint.getError(),
                        rocDataPoint.getNumCorr(), rocDataPoint.getNumIncorr()));
            }
            // error_points
            for (ErrorPoint errroPoint : rocErrorData.getErrorPoint()) {
                sb.append(String.format("ERR error=\"%.3f\" min_prob=\"%.3f\" num_corr=\"%d\" num_incorr=\"%d\"\n",
                        errroPoint.getError(), errroPoint.getMinProb(), errroPoint.getNumCorr(), errroPoint.getNumIncorr()));
            }
        }
        System.out.println(sb.toString());
    }
}

@chhh
Copy link
Owner

chhh commented Oct 31, 2017

@Owen-Duncan in 1.6.1 I changed the names of those methods to better reflect what they're doing. Glad it's working for you.

@chhh chhh added the wiki Informative questions with answers that might help with lib usage label Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wiki Informative questions with answers that might help with lib usage
Projects
None yet
Development

No branches or pull requests

2 participants