Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add retention time information to ion.tsv table #1072

Open
hollenstein opened this issue Apr 6, 2023 · 5 comments
Open

Comments

@hollenstein
Copy link

Dear FragPipe Team,

Is your feature request related to a problem?
The request is not directly related to a problem but it would allow using the FragPipe outputs for additional workflows / use cases. Here are some examples.

In the daily operations of our MS facility, we sometimes need to manually validate the identification or quantification of a particular PTM or protein. To do so it is often helpful to be able to look up the retention time (that was used by the software) of a particular ion (LC-MS feature).

Since FragPipe is quite fast compared to other pipelines, it is also a good choice for QC applications. Here it would be useful to have information regarding retention time apex and FWHM (or retention span), in order to monitor LC performance.

There are retention time prediction tools, for example for generating time-scheduled PRM target lists, that allow calibrating the RT prediction with previous LC-MS data from your system. Having an ion table with the peak retention time would make it straight forward to use the FragPipe output for this calibration.

Describe the solution you'd like
Adding retention time columns to the ion.tsv table. I think it would be most useful to have a peak retention time column and something like a start/end time or FWHM column.

I think it is not necessary to have this information in the combined_ion.tsv table, as it is quite simple to either concatenate all ion.tsv tables, or create a pivoted table from them, containing the retention time columns.

Thanks for your consideration,
David

@fcyu fcyu self-assigned this Apr 6, 2023
@fcyu
Copy link
Member

fcyu commented Apr 6, 2023

Same as #923 . We already have a pre-release version that writes apex retention time to psm.tsv file. If you want to test that version, please send an email to yufe AT umich.edu.

Best,

Fengchao

@fcyu fcyu closed this as completed Apr 6, 2023
@hollenstein
Copy link
Author

Hi, many thanks for letting me have a look at the pre-release version.

I assume that the retention time in the psm.tsv file is the retention time of the MS2 scan. I've also seen that you've added "SAMPLE Apex Retention Time" columns to the combined_ion.tsv report, which is really useful. However, in the ion.tsv there is still no retention time information. We think of the ion.tsv (when you concatenate all of them) as the counterpart to the MaxQuant evidence file, which contains detailed information about quantified ion features as a reference. The ion.tsv table contains slightly more detailed information than the combined_ion.tsv, information which you typically don't need in a standard analysis. Hence I think it would be a good place to pack additional columns, without cluttering the combined_ion file. Would it be possible to also inlcude the "Apex Retention Time" column in the individual ion.tsv files? And, especially for QC applications, it might be useful to also have "Retention Time Start" and "Retention Time End" columns.

Best,
David

@fcyu
Copy link
Member

fcyu commented Apr 7, 2023

Hi David,

Thanks for your feedback.

However, in the ion.tsv there is still no retention time information. We think of the ion.tsv (when you concatenate all of them) as the counterpart to the MaxQuant evidence file, which contains detailed information about quantified ion features as a reference.

I think the combined_ion.tsv is more like the evidence.txt in MaxQuant because the intensities in the combined file are normalized. I also suggest you use the combined_ion.tsv file rather than the individual ion.tsv file.

The ion.tsv table contains slightly more detailed information than the combined_ion.tsv, information which you typically don't need in a standard analysis.

We can add those information to the combined_ion.tsv if they are useful. Could you please tell me what they are?

And, especially for QC applications, it might be useful to also have "Retention Time Start" and "Retention Time End" columns.

Yes. we can add "Retention Time Start" and "Retention Time End" (also "Ion Mobility Start" and "Ion Mobility End" if applicable) in the future.

Best,

Fengchao

@fcyu fcyu reopened this Apr 7, 2023
@hollenstein
Copy link
Author

Hi Fengchao,

The reason why I thought about using the ion.tsv files instead of the combined_ion.tsv file for our purposes is that we often use it "manually" to quickly look something up with excel. The issue with a wide table like the combined_ion is that if you have more than a few samples and several columns that you show for each sample, the number of columns becomes so large that it is not comfortable to use anymore. E.g. with 9 samples and 4 columns (RT, spectral counts, intensity, cv), its already 36 columns you need to look through instead of 4, if you want to find something specific. Of course the combined_ion table is much more useful if we want to compare the quantification between multiple samples.

However, I am not sure how many people use it like that, so if you prefer to only add these things to the combined_ion table its no big issue for us. I could just convert the combined_ion table into a flat table for our purposes.

Regarding potential additional columns:
When we use the ion / evidence table for manual validation it is useful to have some quality score for the ID, which could be the best PSM score (I think probability and expectation is present in the ions.tsv file)

In some cases when validating PTMs, we also look at the deviation of observed m/z from the calculated one (for this observed m/z and calculated m/z would also be fine).

In addition, the MQ evidence table contains a list of all PSM scan numbers that were matched to a particular ion feature. However, I just realized this might become very complicated in certain situations. Maybe one should rather get this by parsing the PSM tables themselves, I guess you simply match all PSMs that have the same peptide+charge+cv and not match PSMs with the feature boundaries, right? FragPipe allows specifying multiple rawfiles with the same experiment and replicate, and afaik the results are then somehow condensed and reported together. So just reporting scan numbers wouldn't make much sense in this case, one would need filename+scannumber. I wonder what do you actually do in this situation? Do you take the average parameters (rt, intensity, ...) of the same ion feature from multiple rawfiles or do you select a "best" feature that is then reported?

Best,

David

@fcyu
Copy link
Member

fcyu commented Apr 12, 2023

Hi David,

Thank you very much for the explanation of using ion.tsv over combined_ion.tsv. For now, I would stick to the combined_ion.tsv due to the limited bandwidth.

When we use the ion / evidence table for manual validation it is useful to have some quality score for the ID, which could be the best PSM score (I think probability and expectation is present in the ions.tsv file)

I can write the best probability to the combined_ion.tsv.

I guess you simply match all PSMs that have the same peptide+charge+cv and not match PSMs with the feature boundaries, right? FragPipe allows specifying multiple rawfiles with the same experiment and replicate, and afaik the results are then somehow condensed and reported together. So just reporting scan numbers wouldn't make much sense in this case, one would need filename+scannumber. I wonder what do you actually do in this situation? Do you take the average parameters (rt, intensity, ...) of the same ion feature from multiple rawfiles or do you select a "best" feature that is then reported?

FragPipe (it is actually IonQuant that does all those quantification-related jobs) picks the highest intensive one if there are multiple "ions" with the same peptidoform+charge. Therefore, linking PSMs to the ions is feasible but not very informative.

Best,

Fengchao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants