`pyhf` Roadmap for IRIS-HEP Execution Phase

.huge.blue[Matthew Feickert]
.huge[(University of Illinois at Urbana-Champaign)] .center.width-5[]

matthew.feickert@cern.ch

2020 IRIS-HEP Institute Retreat

May 27th, 2020

`pyhf` core dev team

Lukas Heinrich

CERN ] .kol-1-3.center[ .circle.width-80[]

Matthew Feickert

Illinois .center.bold.blue[IRIS-HEP] ] .kol-1-3.center[ .circle.width-75[]

Giordon Stark

UCSC SCIPP ] ]

Goals of physics analysis at the LHC

Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]

Provide constraints on models through setting best limits ] ]

All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
Model complexity can be huge for complicated searches
Problem: Time to fit can be .bold[many hours]
.blue[Goal:] Empower analysts with fast fits and expressive models

Analysis Systems through the lens of `pyhf`

.large[Accelerating fitting (reducing time to .bold[insight] (statistical inference)!)]
.large[Flexible schema great for open likelihood .bold[preservation]]
- .normal[Likelihood serves as high information-density summary of analysis]
.large[An enabling technology for .bold[reinterpretation]]

.center.huge[Accomplishments in Year 2]

Full likelihoods (3) preserved on HEPData

Background-only model JSON stored
Signal models stored as JSON Patch files
Together are able to fully preserve the full model (with own DOI! .width-20[] )
c.f. Matthew's CHEP 2019 talk, Lukas's LHCP 2020 talk

.center.width-65[ HEPData_likelihoods ]

Publications using `pyhf`

.kol-1-2.center.width-95[ .center.width-100[]

.center.width-100[] ] .kol-1-2.center.width-100[ .center.width-100[] ]

Rapid adoption in ATLAS...

Impressive appetite for pyhf in ATLAS analyses
Much of SUSY, $HH \to 4b$ limit setting
- Giordon: SUSY Run-2 Summaries subconvener
- Lukas: ATLAS Modeling Group convener
Upcoming: ATLAS Stats Forum recommendation ] .kol-1-2[

.italic.smaller[Thanks for making a tool super easy to use! When I got some [Jupyter] notebooks with this code up and shared with students a lot more of us started including limits in our talks. Before this was a pretty painful step!] .center.smaller[— Nicole Hartman (SLAC), ATLAS Ph.D. Student] ] .kol-1-1[ .kol-1-1[ .kol-1-2[ .center.width-70[] ] .kol-1-2[ .center.width-70[] ] ] .center.smaller[SUSY EWK 3L RPV analysis (ATLAS-CONF-2020-009): Exclusion curves as a function of mass and branching fraction to $Z$ bosons] ]

...and by theory

SModelS team has implemented a SModelS/pyhf interface
- tool for interpreting simplified-model results from the LHC
- designed to be used by theorists
Have produced comparison for .italic[Search for direct stau production in events with two hadronic tau leptons in √s = 13 TeV pp collisions with the ATLAS detector] (ATLAS-SUSY-2018-04) published likelihood
- Compare simplified likelihood (SModelS)
- to full likelihood (pyhf) ] .kol-3-5[ .center.width-100[]

.italic.smaller[ So here is one of our first reasonable validation plots. It's preliminary, the black line is ATLAS-SUSY-2018-04 official exclusion curve. The grey line is SModelS using pyhf, running over the published data. — Wolfgang Waltenberger, CMS/SModelS ] ]

Broader Impact: Upstream contributions

In working to add percentile method across all backends (as part of toys in v0.5.0) discovered discrepancy between NumPy implementation and TensorFlow Probability (TFP)
- Through research between NumPy and TFP source code found a bug in TFP!
- Confirmed by dev team in discussion on GitHub Issue
- Agreed with dev team I would write a PR, which was reviewed and merged in timely manner
Along with Henry and Jim, now have upstream contributions to open source .bold[directly originating] from IRIS-HEP work
pyhf will .bold[need this bug fix] in the next TFP release, and .bold[thousands] of other projects will benefit
Bonus: Continuing goodwill development ] .kol-1-2[ .center.width-100[]

.center.width-100[[![tfp_PRs_merged](figures/tfp_PRs_merged.png)](https://github.com/tensorflow/probability/pulls?q=is%3Apr+author%3Amatthewfeickert+is%3Aclosed)]
.center.width-100[[![csuter_compliment](figures/csuter_compliment.png)](tensorflow/probability#912 (comment))] ]

.center.huge[Roadmap for Year 3 Execution]

In a word: Stability

.center.width-60[]

Adoption by analyses

Any analysis that wants to use pyhf for full Run 2 should be able to

pyhf becomes mature in its feature set
- Stat Config
- Non-asymptotic calculators (toys in v0.5.0)
- Norm factor expressions
Validation across all backends against HistFactory
- pyhf GitHub org setup to help streamline process
- Reproduction of published analyses on HEPData
Documented examples
- Case studies
- Public knowledge base (pyhf Stack Overflow)
- Rosetta stone (and what can't be done) between ROOT HistFactory and pyhf ] .kol-1-2[ .center.width-100[] ]

Benchmarking hardware acceleration

Preliminary results (old) show hardware acceleration giving .bold[order of magnitude speedup] for some models!
Improvements over traditional
- 10 hrs to 30 min; 20 min to 10 sec
Hardware acceleration benchmarking important to find edges ] .kol-1-2[ .center.width-60[] .center.width-60[] ]

Integration into Analysis Ecosystems Pipeline

Most obvious connections:
- ServiceX: direct data transform and delivery
  - Illinois team dynamic between Ben and Matthew
- cabinetry: general interfacing to other tools
  - c.f. Alex's poster from 2020 Poster Session for more details ] .kol-2-3[ .center.width-90[] ]

.center.huge[Successful Application: Years 4/5]

Reducing time to insight: Fitting as a service

pyhf HistFactory model spec is pure JSON: Very natural to use a .blue[REST web API] for remote fitting!

pyhf installed on different clusters with GPUs around the world
User hits a REST API with JSON pyhf workspace as a request
pyhf fits the workspace on the cluster on demand
Returns fit results over REST API to user ] .kol-2-3[ .center.width-90[] ]

Analysis Reuse: `pyhf` JSON native to HEPData

.kol-1-2[ - Growing number of analyses publishing full likelihoods to HEPData - At the moment each likelihood is the collection of many individual signal patch files - Introduce concept of "patchsets" to reduce all of this two two files: - Background only file - Signal patchset file - Would use `hepdata-validator` to resolve all files to inline JSON - Allows for entire likelihood to be natively supported in HEPData (no more tarballs required) ] .kol-1-2[ .center.width-100[[![HEPData_native_support](figures/HEPData_native_support.png)](HEPData/hepdata#164)]
.center.width-100[[![HEPData_patchset](figures/carbon_patchset.png)](HEPData/hepdata#164 (comment))] ]

.center.huge[Grand Challenge Integration]

Analysis Systems Grand Challenge

Following up on Kyle's presentation yesterday

ServiceX to `pyhf`

.center[.bold[ServiceX to perform event selection and deliver histograms for .blue[`pyhf` model]]]
- Should be relatively easy to translate from ServiceX output to `pyhf` JSON model, but probably don't want to - Moving the translation from `pyhf` to `cabinetry` seems like a more robust solution - `cabinetry` has ability to be a powerful tool, but to `pyhf` translation is most interesting - ServiceX to `cabinetry`: data delivery - `cabinetry` to `pyhf`: constructing of likelihood - If useful, Matthew could join contribution efforts - Alex has pointed out this is even mostly doable now with `TRExFitter` - ServiceX feeding histograms to `TRExFitter` - Convert XML to JSON with [`pyhf xml2json`](https://scikit-hep.org/pyhf/cli.html#pyhf-xml2json) - Fit with `pyhf`

`pyhf`: Fitting as a service

.center[ .bold[ Optimize analysis by using automatic differentiation to compute $d(\textrm{Expected limit})/d(\textrm{analysis parameters})$, which are back-propagated from from output of stats tool, .blue[through `pyhf` running in fitting service], back to ServiceX running at analysis facility, and through the event selection & histogramming code ] ]
- As already covered, fitting with `pyhf` can be scaled up on demand and run almost anywhere - Local machine, cluster, AWS - `pyhf` being built on frameworks that automatically handle gradients allows for this to happen naturally - Should get taken care of as a natural part of `pyhf` development

Summary

.bold[Accomplishments]
- Published and preserved full likelihoods
- Become hugely popular and adopted inside ATLAS
- Establishing connections for growth with SModelS and HEPData
.bold[Year 3 Execution]
- Reach stable API and v1.0.0 release
- Provide analysis support
- Benchmark and profile hardware acceleration benefits
.bold[Vision for Year 4/5]
- Globally deployed and scalable "fitting as a service" using REST web API
- Have native support in HEPData for analysis preservation
.bold[Grand Challenge]
- Integrate with cabinetry for ServiceX translation
- Exploit fitting as a service + gradients for differentiable AS pipeline ] .kol-1-3[
  
  .center.width-100[] ]

Backup

HistFactory Template

$$\begin{aligned} &\mathcal{P}\left(n_{c}, x_{e}, a_{p} \middle|\phi_{p}, \alpha_{p}, \gamma_{b} \right) = \\\ &{\color{blue}{\prod_{c \,\in\, \textrm{channels}} \left[\textrm{Pois}\left(n_{c} \middle| \nu_{c}\right) \prod_{e=1}^{n_{c}} f_{c}\left(x_{e} \middle| \vec{\alpha}\right)\right]}} {\color{red}{G\left(L_{0} \middle| \lambda, \Delta_{L}\right) \prod_{p\, \in\, \mathbb{S}+\Gamma} f_{p}\left(a_{p} \middle| \alpha_{p}\right)}} \end{aligned}$$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.blue[Main Poisson p.d.f. for bins observed in all channels]
.red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)

References

ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.

The end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

talk.md

talk.md

`pyhf` Roadmap for IRIS-HEP Execution Phase

`pyhf` core dev team

Goals of physics analysis at the LHC

Analysis Systems through the lens of `pyhf`

.center.huge[Accomplishments in Year 2]

Full likelihoods (3) preserved on HEPData

Publications using `pyhf`

Rapid adoption in ATLAS...

...and by theory

Broader Impact: Upstream contributions

.center.huge[Roadmap for Year 3 Execution]

In a word: Stability

Adoption by analyses

Benchmarking hardware acceleration

Integration into Analysis Ecosystems Pipeline

.center.huge[Successful Application: Years 4/5]

Reducing time to insight: Fitting as a service

Analysis Reuse: `pyhf` JSON native to HEPData

.center.huge[Grand Challenge Integration]

Analysis Systems Grand Challenge

ServiceX to `pyhf`

`pyhf`: Fitting as a service

Summary

HistFactory Template

References

Files

talk.md

Latest commit

History

talk.md

File metadata and controls

pyhf Roadmap for IRIS-HEP Execution Phase

pyhf core dev team

Goals of physics analysis at the LHC

Analysis Systems through the lens of pyhf

.center.huge[Accomplishments in Year 2]

Full likelihoods (3) preserved on HEPData

Publications using pyhf

Rapid adoption in ATLAS...

...and by theory

Broader Impact: Upstream contributions

.center.huge[Roadmap for Year 3 Execution]

In a word: Stability

Adoption by analyses

Benchmarking hardware acceleration

Integration into Analysis Ecosystems Pipeline

.center.huge[Successful Application: Years 4/5]

Reducing time to insight: Fitting as a service

Analysis Reuse: pyhf JSON native to HEPData

.center.huge[Grand Challenge Integration]

Analysis Systems Grand Challenge

ServiceX to pyhf

pyhf: Fitting as a service

Summary

HistFactory Template

References

`pyhf` Roadmap for IRIS-HEP Execution Phase

`pyhf` core dev team

Analysis Systems through the lens of `pyhf`

Publications using `pyhf`

Analysis Reuse: `pyhf` JSON native to HEPData

ServiceX to `pyhf`

`pyhf`: Fitting as a service