Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable specification of matching range #255

Open
JuliaS92 opened this issue Jun 9, 2021 · 4 comments
Open

Enable specification of matching range #255

JuliaS92 opened this issue Jun 9, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@JuliaS92
Copy link

JuliaS92 commented Jun 9, 2021

Is your feature request related to a problem? Please describe.
With biological samples we want to avoid matching identifications between files, where we might genuinely not have the protein in one of the samples (e.g. KO samples, IPs, fractionation, ...), but still be able to transfer IDs between exact biological replicates to boost the numbers. The same goes for peptide fractions, where you would not want to e.g. match between the 1st and 3rd SDB-RPS fraction, but only 1<->2 and 2<->3. Currently it's all or nothing.

Describe the solution you'd like
I personally like the solution in MQ where you only match neighboring fraction numbers very much. It could become even better if you offer two matching 'dimensions' so you can e.g. match same peptide fractions of neighboring biological samples, as well as neighboring peptide fractions of same biological samples. In the example below where it would be biological sample x peptide fraction and sample *2x2* would receive IDs from _4 other raw files_.

-------------------
| 1x1 |_1x2_| 1x3 |
-------------------
|_2x1_|*2x2*|_2x3_|
-------------------
| 3x1 |_3x2_| 3x3 |
-------------------
@straussmaximilian straussmaximilian added the enhancement New feature or request label Jun 9, 2021
@straussmaximilian
Copy link
Member

Hi,
Unfortunately, streamlit has no native editable table option to conveniently select such things as Fractions and Matching groups. I now found a workaround for an editable table (Screenshot attached) that could potentially work.

Screenshot 2021-07-04 at 23 45 05

The idea would be to have a multiselect above the table to exclude runs and then manually enter data for each file. There is also now the column Shortname that has the filename w/o extension and will be checked for duplicates - so we could use this for a cleaner protein group column name.

For automated annotation, I thought about including a regex function that would automatically fill the cells based on the filename.

@ammarcsj @JuliaS92
What do you think about this? More specifically:

  • Should we include any more columns that you would find useful (e.g., something like file creation date?) The table is sortable, so one could easily find outliers (e.g., small file size)
  • I am not familiar with how people typically describe their fractions. Do you have some example filenames so that I could get a feeling for the regex needed? Or do you happen to have something?

There are a couple of limitations with this table layout (e.g., can't select multiple cells and change them at once), but this could be a start.

@ammarcsj
Copy link
Member

ammarcsj commented Jul 6, 2021

Hi Max, this looks cool, just some questions/thoughts:

  • Is it possible to manually change contents of the table? Or is it only possible to inlcude/exclude things with this?
  • From my experience, naming conventions can be very heterogeneous for fractions and raw files in general, so the regex would either have to be flexibly adaptable, or we specify a naming scheme to be used. @JuliaS92 correct me if I'm wrong.
  • One general thing about the fraction notation. Let's say I have the following four experiments (fractions are noted in the name): [exp1_f1.raw, exp1_f2.raw, exp2_f1.raw, exp2_f2.raw]. In the scheme shown above, this would correspond to fraction annotation [1, 2, 1, 2]. We know that 1 and 2 belong to the same experiment, however it is not clear from the vector alone which 1 belongs to which 2, so we have to rely on the sorting. I would think it would be more clear to have the fraction annotation like this: [exp1, exp1, exp2, exp2]. The matching groups could then replace/be equal to the fraction numbers.

@JuliaS92
Copy link
Author

JuliaS92 commented Jul 8, 2021

Hi @straussmaximilian @ammarcsj,

  • I like the interface, but I agree that it will be close to impossible to cover fractions and matching groups for all kinds of naming conventions with a single regex. Allowing people to enter a regex with capture groups and providing some reasonable defaults would be a good in between. +Making it editable in case it is not yet, since many people also just number their raw files.
  • If both groups are numbers one could add a selector for each of the two dimensions, how 'far' to match. This could be a slider for the peptide fractions, so one can e.g. say for 24 fractions to only match with two fractions up and down. For the matching group it could be 'keep separate' (e.g. KO vs wt) or 'adjacent' (e.g. subcellular fractionation or time course). Within identical tuples of (fraction, matching group) matching could only be turned off by disabling it completely. For example I would use it like this (name, peptide fraction, matching group): (Rep1_F1_1, 1, 1), (Rep1_F1_2, 2, 1), (Rep1_F2_1, 1, 2), (Rep1_F2_2, 2, 2), (Rep1_F3_1, 1, 3), (Rep1_F3_2, 2, 3), (Rep2_F1_1, 1, 1), (Rep2_F1_2, 2, 1), (Rep2_F2_1, 1, 2), (Rep2_F2_2, 2, 2), (Rep2_F3_1, 1, 3), (Rep2_F3_2, 2, 3), and then go +/-1 for peptide fractions and adjacent matching for the matching groups.
  • Regarding extra columns I llike the idea of having the creation time. Longterm it would be nice to be able to define groups for PTMs, so you can e.g. analyse full proteomes together with enriched samples. This could be done analogous to the MQ parameter groups.

@ibludau
Copy link
Contributor

ibludau commented Jul 8, 2021

Not sure if this is already integrated, but an option to just upload a design table could be a good addition/alternative. Some people might simply be more comfortable creating the table beforehand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants