Skip to content

Latest commit

 

History

History
403 lines (323 loc) · 15.4 KB

File metadata and controls

403 lines (323 loc) · 15.4 KB

class: middle, center, title-slide count: false

pyhf Roadmap for IRIS-HEP Execution Phase


.huge.blue[Matthew Feickert]
.huge[(University of Illinois at Urbana-Champaign)] .center.width-5[illinois_logo]

matthew.feickert@cern.ch

2020 IRIS-HEP Institute Retreat

May 27th, 2020


pyhf core dev team


.grid[ .kol-1-3.center[ .circle.width-80[Lukas]

Lukas Heinrich

CERN ] .kol-1-3.center[ .circle.width-80[Matthew]

Matthew Feickert

Illinois .center.bold.blue[IRIS-HEP] ] .kol-1-3.center[ .circle.width-75[Giordon]

Giordon Stark

UCSC SCIPP ] ]


Goals of physics analysis at the LHC

.kol-1-1[ .kol-1-3.center[ .width-100[ATLAS_Higgs_discovery] Search for new physics ] .kol-1-3.center[
.width-100[CMS-PAS-HIG-19-004]


Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]

Provide constraints on models through setting best limits ] ]

  • All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
  • Model complexity can be huge for complicated searches
  • Problem: Time to fit can be .bold[many hours]
  • .blue[Goal:] Empower analysts with fast fits and expressive models

Analysis Systems through the lens of pyhf

.center[ .width-75[analysis-systems-scope] ]

  • .large[Accelerating fitting (reducing time to .bold[insight] (statistical inference)!)]
  • .large[Flexible schema great for open likelihood .bold[preservation]]
    • .normal[Likelihood serves as high information-density summary of analysis]
  • .large[An enabling technology for .bold[reinterpretation]]

class: middle

.center.huge[Accomplishments in Year 2]


Full likelihoods (3) preserved on HEPData

  • Background-only model JSON stored
  • Signal models stored as JSON Patch files
  • Together are able to fully preserve the full model (with own DOI! .width-20[DOI] )
  • c.f. Matthew's CHEP 2019 talk, Lukas's LHCP 2020 talk

.center.width-65[HEPData_likelihoods]


Publications using pyhf

.kol-1-2.center.width-95[ .center.width-100[ATLAS_PUB_Note_title]

.center.width-100[overlay_multiplex_contour] ] .kol-1-2.center.width-100[ .center.width-100[CERN_news_story] ]


Rapid adoption in ATLAS...

.kol-1-2[

  • Impressive appetite for pyhf in ATLAS analyses
  • Much of SUSY, $HH \to 4b$ limit setting
    • Giordon: SUSY Run-2 Summaries subconvener
    • Lukas: ATLAS Modeling Group convener
  • Upcoming: ATLAS Stats Forum recommendation ] .kol-1-2[

.italic.smaller[Thanks for making a tool super easy to use! When I got some [Jupyter] notebooks with this code up and shared with students a lot more of us started including limits in our talks. Before this was a pretty painful step!] .center.smaller[— Nicole Hartman (SLAC), ATLAS Ph.D. Student] ] .kol-1-1[ .kol-1-1[ .kol-1-2[ .center.width-70[SUSY_EWK_3L_validation] ] .kol-1-2[ .center.width-70[SUSY_EWK_3L_validation] ] ] .center.smaller[SUSY EWK 3L RPV analysis (ATLAS-CONF-2020-009): Exclusion curves as a function of mass and branching fraction to $Z$ bosons] ]


...and by theory

.kol-2-5[

  • SModelS team has implemented a SModelS/pyhf interface
    • tool for interpreting simplified-model results from the LHC
    • designed to be used by theorists
  • Have produced comparison for .italic[Search for direct stau production in events with two hadronic tau leptons in √s = 13 TeV pp collisions with the ATLAS detector] (ATLAS-SUSY-2018-04) published likelihood
    • Compare simplified likelihood (SModelS)
    • to full likelihood (pyhf) ] .kol-3-5[ .center.width-100[SModels-plot]

.italic.smaller[ So here is one of our first reasonable validation plots. It's preliminary, the black line is ATLAS-SUSY-2018-04 official exclusion curve. The grey line is SModelS using pyhf, running over the published data. — Wolfgang Waltenberger, CMS/SModelS ] ]


Broader Impact: Upstream contributions

.kol-1-2[

  • In working to add percentile method across all backends (as part of toys in v0.5.0) discovered discrepancy between NumPy implementation and TensorFlow Probability (TFP)
    • Through research between NumPy and TFP source code found a bug in TFP!
    • Confirmed by dev team in discussion on GitHub Issue
    • Agreed with dev team I would write a PR, which was reviewed and merged in timely manner
  • Along with Henry and Jim, now have upstream contributions to open source .bold[directly originating] from IRIS-HEP work
  • pyhf will .bold[need this bug fix] in the next TFP release, and .bold[thousands] of other projects will benefit
  • Bonus: Continuing goodwill development ] .kol-1-2[ .center.width-100[csuter_issue_okay]

.center.width-100[[![tfp_PRs_merged](figures/tfp_PRs_merged.png)](https://github.com/tensorflow/probability/pulls?q=is%3Apr+author%3Amatthewfeickert+is%3Aclosed)]
.center.width-100[[![csuter_compliment](figures/csuter_compliment.png)](tensorflow/probability#912 (comment))] ]

class: middle

.center.huge[Roadmap for Year 3 Execution]


In a word: Stability

.center.width-60[carbon_diff]


Adoption by analyses

  • Any analysis that wants to use pyhf for full Run 2 should be able to

.kol-1-2[ .bold[Requirements]:

  • pyhf becomes mature in its feature set
    • Stat Config
    • Non-asymptotic calculators (toys in v0.5.0)
    • Norm factor expressions
  • Validation across all backends against HistFactory
    • pyhf GitHub org setup to help streamline process
    • Reproduction of published analyses on HEPData
  • Documented examples
    • Case studies
    • Public knowledge base (pyhf Stack Overflow)
    • Rosetta stone (and what can't be done) between ROOT HistFactory and pyhf ] .kol-1-2[ .center.width-100[pyhf_GitHub_org.png] ]

Benchmarking hardware acceleration

.kol-1-2[ .width-60.center[scaling_hardware]

  • Preliminary results (old) show hardware acceleration giving .bold[order of magnitude speedup] for some models!
  • Improvements over traditional
    • 10 hrs to 30 min; 20 min to 10 sec
  • Hardware acceleration benchmarking important to find edges ] .kol-1-2[ .center.width-60[BoZheng_fellow.png] .center.width-60[chris_tunnell_fellow_tweet.png] ]

Integration into Analysis Ecosystems Pipeline

.center[ .width-55[analysis-systems-scope] ]

.kol-1-3[


class: middle

.center.huge[Successful Application: Years 4/5]


Reducing time to insight: Fitting as a service

pyhf HistFactory model spec is pure JSON: Very natural to use a .blue[REST web API] for remote fitting!

.kol-1-3[

  1. pyhf installed on different clusters with GPUs around the world
  2. User hits a REST API with JSON pyhf workspace as a request
  3. pyhf fits the workspace on the cluster on demand
  4. Returns fit results over REST API to user ] .kol-2-3[ .center.width-90[carbon_fitting_as_a_service] ]

Analysis Reuse: pyhf JSON native to HEPData


.kol-1-2[ - Growing number of analyses publishing full likelihoods to HEPData - At the moment each likelihood is the collection of many individual signal patch files - Introduce concept of "patchsets" to reduce all of this two two files: - Background only file - Signal patchset file - Would use `hepdata-validator` to resolve all files to inline JSON - Allows for entire likelihood to be natively supported in HEPData (no more tarballs required) ] .kol-1-2[ .center.width-100[[![HEPData_native_support](figures/HEPData_native_support.png)](HEPData/hepdata#164)]
.center.width-100[[![HEPData_patchset](figures/carbon_patchset.png)](HEPData/hepdata#164 (comment))] ]

class: middle

.center.huge[Grand Challenge Integration]


Analysis Systems Grand Challenge

Following up on Kyle's presentation yesterday

.center[ .width-80[grand_challenge_pyhf_highlight] ]


ServiceX to pyhf


.center[.bold[ServiceX to perform event selection and deliver histograms for .blue[`pyhf` model]]]
- Should be relatively easy to translate from ServiceX output to `pyhf` JSON model, but probably don't want to - Moving the translation from `pyhf` to `cabinetry` seems like a more robust solution - `cabinetry` has ability to be a powerful tool, but to `pyhf` translation is most interesting - ServiceX to `cabinetry`: data delivery - `cabinetry` to `pyhf`: constructing of likelihood - If useful, Matthew could join contribution efforts - Alex has pointed out this is even mostly doable now with `TRExFitter` - ServiceX feeding histograms to `TRExFitter` - Convert XML to JSON with [`pyhf xml2json`](https://scikit-hep.org/pyhf/cli.html#pyhf-xml2json) - Fit with `pyhf`

pyhf: Fitting as a service


.center[ .bold[ Optimize analysis by using automatic differentiation to compute $d(\textrm{Expected limit})/d(\textrm{analysis parameters})$, which are back-propagated from from output of stats tool, .blue[through `pyhf` running in fitting service], back to ServiceX running at analysis facility, and through the event selection & histogramming code ] ]
- As already covered, fitting with `pyhf` can be scaled up on demand and run almost anywhere - Local machine, cluster, AWS - `pyhf` being built on frameworks that automatically handle gradients allows for this to happen naturally - Should get taken care of as a natural part of `pyhf` development

Summary

.kol-2-3[

  • .bold[Accomplishments]
    • Published and preserved full likelihoods
    • Become hugely popular and adopted inside ATLAS
    • Establishing connections for growth with SModelS and HEPData
  • .bold[Year 3 Execution]
    • Reach stable API and v1.0.0 release
    • Provide analysis support
    • Benchmark and profile hardware acceleration benefits
  • .bold[Vision for Year 4/5]
    • Globally deployed and scalable "fitting as a service" using REST web API
    • Have native support in HEPData for analysis preservation
  • .bold[Grand Challenge]
    • Integrate with cabinetry for ServiceX translation
    • Exploit fitting as a service + gradients for differentiable AS pipeline ] .kol-1-3[


      .center.width-100[pyhf-logo] ]

class: end-slide, center

Backup


HistFactory Template


$$\begin{aligned} &\mathcal{P}\left(n_{c}, x_{e}, a_{p} \middle|\phi_{p}, \alpha_{p}, \gamma_{b} \right) = \\\ &{\color{blue}{\prod_{c \,\in\, \textrm{channels}} \left[\textrm{Pois}\left(n_{c} \middle| \nu_{c}\right) \prod_{e=1}^{n_{c}} f_{c}\left(x_{e} \middle| \vec{\alpha}\right)\right]}} {\color{red}{G\left(L_{0} \middle| \lambda, \Delta_{L}\right) \prod_{p\, \in\, \mathbb{S}+\Gamma} f_{p}\left(a_{p} \middle| \alpha_{p}\right)}} \end{aligned}$$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for bins observed in all channels]
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
    • encoding systematic uncertainties (normalization, shape, etc)

References

  1. ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
  2. L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.

class: end-slide, center count: false

The end.