Enable specification of matching range #255

JuliaS92 · 2021-06-09T09:51:21Z

Is your feature request related to a problem? Please describe.
With biological samples we want to avoid matching identifications between files, where we might genuinely not have the protein in one of the samples (e.g. KO samples, IPs, fractionation, ...), but still be able to transfer IDs between exact biological replicates to boost the numbers. The same goes for peptide fractions, where you would not want to e.g. match between the 1st and 3rd SDB-RPS fraction, but only 1<->2 and 2<->3. Currently it's all or nothing.

Describe the solution you'd like
I personally like the solution in MQ where you only match neighboring fraction numbers very much. It could become even better if you offer two matching 'dimensions' so you can e.g. match same peptide fractions of neighboring biological samples, as well as neighboring peptide fractions of same biological samples. In the example below where it would be biological sample x peptide fraction and sample *2x2* would receive IDs from _4 other raw files_.

-------------------
| 1x1 |_1x2_| 1x3 |
-------------------
|_2x1_|*2x2*|_2x3_|
-------------------
| 3x1 |_3x2_| 3x3 |
-------------------

The text was updated successfully, but these errors were encountered:

straussmaximilian · 2021-07-04T21:54:58Z

Hi,
Unfortunately, streamlit has no native editable table option to conveniently select such things as Fractions and Matching groups. I now found a workaround for an editable table (Screenshot attached) that could potentially work.

The idea would be to have a multiselect above the table to exclude runs and then manually enter data for each file. There is also now the column Shortname that has the filename w/o extension and will be checked for duplicates - so we could use this for a cleaner protein group column name.

For automated annotation, I thought about including a regex function that would automatically fill the cells based on the filename.

@ammarcsj @JuliaS92
What do you think about this? More specifically:

Should we include any more columns that you would find useful (e.g., something like file creation date?) The table is sortable, so one could easily find outliers (e.g., small file size)
I am not familiar with how people typically describe their fractions. Do you have some example filenames so that I could get a feeling for the regex needed? Or do you happen to have something?

There are a couple of limitations with this table layout (e.g., can't select multiple cells and change them at once), but this could be a start.

ammarcsj · 2021-07-06T09:39:26Z

Hi Max, this looks cool, just some questions/thoughts:

Is it possible to manually change contents of the table? Or is it only possible to inlcude/exclude things with this?
From my experience, naming conventions can be very heterogeneous for fractions and raw files in general, so the regex would either have to be flexibly adaptable, or we specify a naming scheme to be used. @JuliaS92 correct me if I'm wrong.
One general thing about the fraction notation. Let's say I have the following four experiments (fractions are noted in the name): [exp1_f1.raw, exp1_f2.raw, exp2_f1.raw, exp2_f2.raw]. In the scheme shown above, this would correspond to fraction annotation [1, 2, 1, 2]. We know that 1 and 2 belong to the same experiment, however it is not clear from the vector alone which 1 belongs to which 2, so we have to rely on the sorting. I would think it would be more clear to have the fraction annotation like this: [exp1, exp1, exp2, exp2]. The matching groups could then replace/be equal to the fraction numbers.

JuliaS92 · 2021-07-08T07:24:34Z

Hi @straussmaximilian @ammarcsj,

I like the interface, but I agree that it will be close to impossible to cover fractions and matching groups for all kinds of naming conventions with a single regex. Allowing people to enter a regex with capture groups and providing some reasonable defaults would be a good in between. +Making it editable in case it is not yet, since many people also just number their raw files.
If both groups are numbers one could add a selector for each of the two dimensions, how 'far' to match. This could be a slider for the peptide fractions, so one can e.g. say for 24 fractions to only match with two fractions up and down. For the matching group it could be 'keep separate' (e.g. KO vs wt) or 'adjacent' (e.g. subcellular fractionation or time course). Within identical tuples of (fraction, matching group) matching could only be turned off by disabling it completely. For example I would use it like this (name, peptide fraction, matching group): (Rep1_F1_1, 1, 1), (Rep1_F1_2, 2, 1), (Rep1_F2_1, 1, 2), (Rep1_F2_2, 2, 2), (Rep1_F3_1, 1, 3), (Rep1_F3_2, 2, 3), (Rep2_F1_1, 1, 1), (Rep2_F1_2, 2, 1), (Rep2_F2_1, 1, 2), (Rep2_F2_2, 2, 2), (Rep2_F3_1, 1, 3), (Rep2_F3_2, 2, 3), and then go +/-1 for peptide fractions and adjacent matching for the matching groups.
Regarding extra columns I llike the idea of having the creation time. Longterm it would be nice to be able to define groups for PTMs, so you can e.g. analyse full proteomes together with enriched samples. This could be done analogous to the MQ parameter groups.

ibludau · 2021-07-08T07:45:14Z

Not sure if this is already integrated, but an option to just upload a design table could be a good addition/alternative. Some people might simply be more comfortable creating the table beforehand.

straussmaximilian added the enhancement New feature or request label Jun 9, 2021

straussmaximilian added a commit that referenced this issue Jul 9, 2021

editable table for adding fraction information #255

e2f6cda

straussmaximilian added a commit that referenced this issue Mar 3, 2022

FEAT mathcing groups, matching bug #255, #401

0f2e120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable specification of matching range #255

Enable specification of matching range #255

JuliaS92 commented Jun 9, 2021

straussmaximilian commented Jul 4, 2021

ammarcsj commented Jul 6, 2021

JuliaS92 commented Jul 8, 2021

ibludau commented Jul 8, 2021

Enable specification of matching range #255

Enable specification of matching range #255

Comments

JuliaS92 commented Jun 9, 2021

straussmaximilian commented Jul 4, 2021

ammarcsj commented Jul 6, 2021

JuliaS92 commented Jul 8, 2021

ibludau commented Jul 8, 2021