Project Goals and Running Notes #4

matthewfeickert · 2020-06-08T20:06:36Z

This Issue can serve as running list of discussion on project goals and needs, as well as a point to have discussions for future reference.

matthewfeickert · 2020-06-08T20:09:31Z

What is the test dataset?

The dataset is arbitrary and should be defined by the user at runtime. We want to both be able to benchmark the performance of the different backends on published ATLAS likelihoods like is done in the example, as well as be able to create arbitrarily large/difficult workspaces to fit to find edge cases.

What computation is needed?

In general, to be able to perform a hypothesis test (pyhf.infer.hypotest).

For the benchmark, you just expected to record the time CPU or GPU used for computation. What else?

We care primarily about benchmarking the fit time but if there are other things that are important for benchmarking then these should be added too.

coolalexzb · 2020-06-08T23:25:40Z

From the example.py and tutorial I read, I find two ways to generate data:
1 Generate data from the workspace (workspace file might be local or be downloaded online)
2 Generate data using Python scripts

Question: Is there any other approach to generate data? (This is related to the first part of the data pipeline)

Question: When I read pyhf document, I still feel confused about some terms, such as channel, modifier, signal, background, etc. All of these might be related to physics, can you help me understand some terms easily? I noticed that some variables in Python scripts might be related to these terms. If I can understand some terms, I think I can work in higher efficiency.

matthewfeickert · 2020-06-09T20:59:32Z

From the example.py and tutorial I read, I find two ways to generate data:
1 Generate data from the workspace (workspace file might be local or be downloaded online)
2 Generate data using Python scripts

Question: Is there any other approach to generate data? (This is related to the first part of the data pipeline)

If "data" here means the workspace, then realistically no. No one is really going to make them by hand, and while you can create them using the pyhf xml2json CLI tool from a binary file type called .root we aren't going to be doing that here. So it is safe to assume that we will either be using pre-existing files or generating pathological cases with Python.

If you want a model that has many bins that is quite easy to do with Python (see the [pyhf] Stack Overflow tag for examples). If you want something that is difficult to fit that will depend on what sorts of systematics and things you put in the model spec. We can discuss this more later.

Question: When I read pyhf document, I still feel confused about some terms, such as channel, modifier, signal, background, etc. All of these might be related to physics, can you help me understand some terms easily? I noticed that some variables in Python scripts might be related to these terms. If I can understand some terms, I think I can work in higher efficiency.

Everything should be covered in the Intro documentation. Can you tell us what parts of that section aren't clear (this would be good to know in general)? Background and Signal just mean "the processes we already know should happen due to known physics" (background) and "processes that are predicted to happen that we are looking for statistical evidence for" (signal) — they are just different components of a statistical model.

matthewfeickert added documentation Improvements or additions to documentation question Further information is requested labels Jun 8, 2020

matthewfeickert assigned matthewfeickert and coolalexzb Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Goals and Running Notes #4

Project Goals and Running Notes #4

matthewfeickert commented Jun 8, 2020

matthewfeickert commented Jun 8, 2020

coolalexzb commented Jun 8, 2020

matthewfeickert commented Jun 9, 2020 •

edited

Project Goals and Running Notes #4

Project Goals and Running Notes #4

Comments

matthewfeickert commented Jun 8, 2020

matthewfeickert commented Jun 8, 2020

coolalexzb commented Jun 8, 2020

matthewfeickert commented Jun 9, 2020 • edited

matthewfeickert commented Jun 9, 2020 •

edited