Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper subcategories? #1

Open
dguest opened this issue Jun 12, 2017 · 7 comments
Open

Paper subcategories? #1

dguest opened this issue Jun 12, 2017 · 7 comments
Assignees

Comments

@dguest
Copy link
Contributor

dguest commented Jun 12, 2017

I'm wondering if you'd want some kind of sub-categories for the papers, since there are a few dozen papers regarding hep for ML and it may get sort of confusing. Either that or you can vet the list for the "important" ones, but that's going to be a bit more subjective.

I'm not sure about the ideal categorization though: my personal bias would be to have a category for each experiment (with ATLAS and CMS lumped together, they do the same thing), and maybe then have sub-categories that split between the more "theoretical" papers (using Delphes or just Pythia) and the things from real experiments (there are a few results from ATLAS, CMS, and some neutrino experiments).

@matthewfeickert
Copy link
Member

@dguest I think this is an excellent idea, especially as the paper list is growing fast. If it gets too large we might even need to split it up into different .md files.

What do you think about splitting the papers into subsection by ML topic (similar to how the talks at DS@HEP 2017 were split by topic)? So to name a few that would be Computer Vision/Jet Images, Anomaly/Outlier Detection, Adversarial Networks, ...

@dguest
Copy link
Contributor Author

dguest commented Jun 27, 2017

Yeah I'm trying to figure out if it makes more sense to split by the physics signature or by the algorithm. I think I agree that grouping by the type of algorithm is probably most useful.

@matthewfeickert
Copy link
Member

matthewfeickert commented Jul 2, 2017

Okay. Can we then have a discussion on what should be the algorithm subsections we should use? @dguest @mickypaganini @makagan @SergeiML @Marie89 if you have insight here on where it makes sense to draw meaningful distinctions that would be helpful. For example, I don't know where to meaningfully distinguish between neural networks in general and deep learning.

Some preliminary paper subcategories suggestions:

  • Boosted Decision Trees (BDT)
  • Computer Vision
  • Anomaly/Outlier Detection
  • Neural Networks
    • Convolutional Neural Network (CNNs)
    • Recurrent Neural Networks (RNNs)
  • Deep Learning
    • Generative Adversarial Networks (GANs)

@bstienen
Copy link
Contributor

bstienen commented Jul 3, 2017

I am not sure if i agree with the paper categorization based on algorithm type, it depends on what the goal is of the list. If we want to provide an overview of how specific ML algorithms are used then categorization by algorithm is most useful, but this draws attention away from the physics. If we however want to approach it with "let's see how machine learning can be used in HEP" it feels more natural to start from the physics topics and categorize by those topics. I personally prefer the last approach, which would yield a list like the following (not exhaustive, quickly made based on the papers currently present in the repo)

  • Event generation
  • Jet tagging
  • Particle identification
  • Triggering
  • Searches and analyses
  • Recasting

If however we end up deciding to group by algorithm, may i then suggest to replace Boosted Decision Trees by Ensemble Methods? That way, also algorithms like Random Forest and the somewhat more general AdaBoost algorithms can be categorized. This would yield then:

  • Ensemble Methods
  • Computer Vision
  • Anomaly/Outlier Detection
  • Neural Networks
    • Convolutional Neural Networks (CNNs)
    • Recurrent Neural Networks (RNNs)
  • Deep Learning
    • Generative Adversarial Networks (GANs)

@matthewfeickert
Copy link
Member

matthewfeickert commented Jul 4, 2017

@bstienen I see what you're saying. I was originally more thinking of "How are the types of machine learning applied to HEP?" but I think for our community the idea of "In what areas of HEP is machine learning applied?" is maybe the better way to phrase things, so I like your proposed classification style.

Looking at your quick list:

  • Event generation
  • Jet tagging
  • Particle identification
  • Triggering
  • Searches and analyses
  • Recasting

this seems good, but maybe we might want to refine "Searches and analyses" a bit? Also, going off of @dguest's original comment in the issue we should have sections for theory work as well. Thoughts?

At the risk of things getting to busy, we can even tag papers with badges indicating the type of machine learning used

Example:

  • M. Paganini, L. de Oliveira, and B. Nachman, "CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks," arXiv:1705.02355 [hep-ex]. (May 5, 2017) ML type

@bstienen
Copy link
Contributor

bstienen commented Jul 5, 2017

@matthewfeickert I agree that "searches and analyses" is quite a broad category and i am totally in favour of narrowing it down into multiple smaller ones. However, given the papers currently summed up in the repository i was not able to make a splitting that i was happy with, maybe somebody else can help in this.

About theory papers: i am not sure how this could be done best. Event generation of course is a purely theoretical category, but something like searches and analyses is more of a hybrid category. Maybe following @dguest's suggestion and making subcategories is a way to go, but what do we do then with hybrid papers i wonder... My suggestion would therefore be to not make subcategories for theory and experiment (it would only create problems in the long run) and let the category names speak for itself in whether or not the papers in it are purely theoretical / purely experimental / hybrid.

I do like the idea of the badges 😃!

@matthewfeickert
Copy link
Member

This Issue is obviously very old, but given that it is not closed, I am noting that PR #50 will somewhat affect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants