Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chronological/production classification of papers #55

Open
particleist opened this issue Feb 7, 2021 · 5 comments
Open

Chronological/production classification of papers #55

particleist opened this issue Feb 7, 2021 · 5 comments

Comments

@particleist
Copy link

First of all a huge thank you @matthewfeickert @bnachman for setting this up, I suspect it will become a rather invaluable resource for us all in the coming years.

I understand the logic of classifying papers by subject or method domain, but I am wondering if there would be some way to highlight the chronological order in which the subject developed, and also which of the listed papers ended up getting used in production (and when). I see that you have a section at the end with experimental results where you've explicitly made the choice to focus on papers "that use deep learning in a critical way for the final analysis sensitivity", which goes somewhat in this direction but is a fairly restrictive criterion. Particularly in environments where the signal is abundant (e.g. LHCb) the use of ML in flavour tagging and triggering often improves the overall analysis sensitivity by a comparable or greater amount than its use in the final offline analysis. In such environments DNNs also tend to gain less compared to BDTs and other more basic ML methods, so the focus on deep learning also seems a bit restrictive.

Obviously I have something of a personal interest in highlighting chronology and use in production given the Bansai BDT paper in particular, but I don't think my request is purely parochial. Someone new to the subject would, I suppose, find these questions of at least some interest. And identifying which methods ended up getting used in production has the secondary benefit of highlighting the often rather invisible work needed to go from proof-of-concept to production-grade code.

Thank you for your consideration.

@bnachman
Copy link
Collaborator

bnachman commented Feb 8, 2021

Hi @particleist, thank you for your contribution! Yes, what you propose sounds like a great idea. The last section is clearly underdeveloped and it is mostly a placeholder for when I or someone else has time to flush out a more detailed listing of "applications" (which goes in the direction of "methods that found their way into physics results"). Deep learning is arbitrary, but clearly, we need to draw the line somewhere (otherwise, every HEP paper would be listed (!)). As for the chronology, I also think that would indeed be helpful. Do you have a proposal for how to do this?

@particleist
Copy link
Author

Thanks for the positive reply @bnachman !

For the chronology, I guess it depends on the output format. For the markdown version the papers could be ordered by the year field in the bibtex, and then the year could be added after the paper title in parentheses. That's a bit of a faff but it's visually simple for the reader.

For the PDF version it's a bit different because you get a lot of additionally useful information about the sections (I've seen the issue about porting this to markdown) but the references themselves are compressed. I guess you have discussed expanding the papers in PDF format similarly to the markdown version and you are worried about making the PDF overly long and difficult to traverse? I don't immediately see to show the chronology in the PDF without becoming more verbose because of the fact that papers can appear in multiple categories.

Coming back to the applications point, I guess the easiest way to collect this information would be for the various collaboration IML contacts to help maintain it? I agree one has to draw a line somewhere, of course, for exactly the reason you give. I think having a section which highlights those of the previously mentioned papers which were used in production makes that easier in some sense, as it spares you having to list every physics result which used them -- instead you could just list the years during which the method was relevant in production which is maybe a more useful snapshot of the field's history?

@matthewfeickert
Copy link
Member

Hi @particleist. :)

For the markdown version the papers could be ordered by the year field in the bibtex, and then the year could be added after the paper title in parentheses. That's a bit of a faff but it's visually simple for the reader.
...
I guess you have discussed expanding the papers in PDF format similarly to the markdown version and you are worried about making the PDF overly long and difficult to traverse? I don't immediately see to show the chronology in the PDF without becoming more verbose because of the fact that papers can appear in multiple categories.

We haven't really discussed this in depth, but the points that you raise are good ones. Having the years next to paper in the README sounds nice and as you point out getting the chronology correct is a difficult problem to deal with as well. For the way the PDF is currenlty set up the best one could hope for would be adding the citation reference in the correct order for each section the paper appears in, but that gets manually tedious and unmaintainable fast. :/ Suggestions welcome here.

Coming back to the applications point, I guess the easiest way to collect this information would be for the various collaboration IML contacts to help maintain it?

The more people who are willing to contribute maintenance time the better 👍, but I think we'll need to let the IML contacts speak for themselves (and their limited volunteer time) here. Certainly wouldn't hurt to put out a call for contributors at a meeting though.

I think having a section which highlights those of the previously mentioned papers which were used in production makes that easier in some sense, as it spares you having to list every physics result which used them -- instead you could just list the years during which the method was relevant in production which is maybe a more useful snapshot of the field's history?

Can you give a small mock example here of what you're envisioning? Maybe I'm misunderstanding the idea, but depending on the breadth of topics is would be difficult to monitor what is in production for how long without explicit volunteer contributors from the experiments, like you mentioned earlier.

@bnachman
Copy link
Collaborator

Thank you @particleist ! For the sorting by year, I think that can be done automatically:

https://tex.stackexchange.com/questions/4461/how-to-sort-bibtex-references-in-reverse-chronological-order

(see the biblatex solution - that seems rather elegant)

The years appear in the markdown, so that should not be a problem. For the PDF, we could have chunks by year (e.g. blah blah (2015 [], 2016 [], ...) which would not be too hard - what do you think about that?

@particleist
Copy link
Author

Thanks for your thoughts! Yes on the solution for years what you say seems completely sensible.

For the contributions from experiments I think this would anyway be a good thing, and the IML umbrella should hopefully make that reasonably feasible no? Instead of a separate section this could also be implemented at least in the README format as a kind of symbol (asterisk, dagger, etc) attached to a paper with a brief sentence of when it was used in production and for what purposes. In the readme it would presumably go into some extraInfo field in the bibtex and be listed in the reference. I see that this might get difficult to maintain but my hope is that since it is in the collaboration's own interests to recognise the work to transform these ideas into production code, they would be supportive of helping to maintain this part. I'm discussing with people inside LHCb fwiw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants