pyhf

.center.width-30[]

.huge.blue[Matthew Feickert]

SciPy 2020
July 7th, 2020

`pyhf` core dev team

Lukas Heinrich

CERN ] .kol-1-3.center[ .circle.width-80[]

Matthew Feickert

University of Illinois
Urbana-Champaign ] .kol-1-3.center[ .circle.width-75[]

Giordon Stark

UCSC SCIPP ] ]

We're high energy particle physicists

.kol-1-2.center[
.width-95[] LHC ] .kol-1-2.center[ .width-95[] ATLAS ] .kol-1-1[ .kol-1-2.center[ .width-45[] ] .kol-1-2.center[ .kol-1-2.center[ .width-100[] ] .kol-1-2.center[ .width-85[] ] ] ]

So we want to know

.center.width-100[[![CERN_ribbon_of_truth](figures/CERN_ribbon_of_truth.png)](https://home.cern/)]

Goals of physics analysis at the LHC

Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]

Provide constraints on models through setting best limits ] ]

All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
Model complexity can be huge for complicated searches
Problem: Time to fit can be .bold[many hours]
.blue[Goal:] Empower analysts with fast fits and expressive models

HistFactory Model

A flexible probability density function (p.d.f.) template to build statistical models in high energy physics (HEP)
Developed during work that lead to the Higgs discovery in 2011 [CERN-OPEN-2012-016]
Widely used by the HEP community for .bold[measurements of known physics] (Standard Model) and
.bold[searches for new physics] (beyond the Standard Model)

.kol-2-5.center[ .width-90[] .bold[Standard Model] ] .kol-3-5.center[ .width-100[] .bold[Beyond the Standard Model] ]

HistFactory Template

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
.katex[Event rates] $\nu_{cb}$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
.red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encode systematic uncertainties (e.g. normalization, shape)
$\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars ] .kol-1-2[ .center.width-100[] .center[Example: .bold[Each bin] is separate (1-bin) channel,
each .bold[histogram] (color) is a sample and share
a .bold[normalization systematic] uncertainty] ]

HistFactory Template

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

Mathematical grammar for a simultaneous fit with

.blue[multiple "channels"] (analysis regions, (stacks of) histograms)
each region can have .blue[multiple bins]
coupled to a set of .red[constraint terms]

.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined] .center[.bold[Until now] (2018), the only implementation of HistFactory has been in a monolithic C++ library used in HEP (ROOT)]

Basic object of HistFactory is the statistical model

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right) \prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right) $$

$$ \ln L \left(\vec{\theta} \,\middle| \vec{x}\right) \Rightarrow \texttt{model.logpdf(parameters, data)} $$

.center.width-100[]

Model is represented as a computational graph

Each node of the graph represents a vectorized
$n$-dimensional array ("tensorized") operation
The graph (model) is largely factorized between the .pars_blue[parameter] graph and the
.data_green[data] graph
The bottom node is then used for final log likelihood .value_orange[value]

$$ \texttt{\textcolor{#ddc16c}{value} = model.logpdf(\textcolor{#73bbe6}{parameters}, \textcolor{#a5dc92}{data})} $$ ] .kol-2-3[ .center.width-90[![DAG](figures/computational_graph.png)] ] ]

Core task: Maximum likelihood fits

$$ \texttt{pyhf.infer.mle.fit(data, model)} $$

.center[minimizes the objective function $-2\ln L \left(\vec{\theta} \,\middle| \vec{x}\right)$ (maximizing the likelihood) with the backend's optimizer]

.center.width-100[]

Use in profile likelihood fits to ask

.center.bold.blue["Given our model and the data, did we discover new physics?"] $$ -2\ln \Lambda (\mu) = - 2\ln\frac{L(\mu, \hat{\!\hat{\theta}})}{L(\hat{\mu}, \hat{\theta})} \quad \frac{\Leftarrow\textrm{constrained best fit}}{\Leftarrow\textrm{unconstrained best fit}} $$

.center[compute (.bold[fast as possible!]) modified $p$-value (the $\mathrm{CL}_{s}$) for a given parameter of interest $\mu$ — hypothesis testing!] $$ \texttt{pyhf.infer.hypotest(testpoi, data, model)} $$

.center.width-65[]

JSON spec fully describes the HistFactory model

.kol-1-4.width-100[

Human & machine readable .bold[declarative] statistical models
Industry standard
- Will be with us forever
Parsable by every language
- Highly portable
- No lock in
Versionable and easily preserved
- JSON Schema describing
  HistFactory specification
- Attractive for analysis preservation
- Highly compressible ] .kol-3-4.center[ .width-105[]

JSON Patch for signal model (reinterpretation)

.center[JSON Patch gives ability to .bold[easily mutate model]]
.center[Think: test a .bold[new theory] with a .bold[new patch]!]

.center.width-100[] ] .kol-1-5[

.center.width-100[] .center[Signal model B] ]

Likelihood serialization and reproduction/reuse

Background-only model JSON stored
Hundreds of signal model JSON Patches stored together as a "patch set" file
Together are able to publish and fully preserve the full likelihood (with own DOI! .width-20[] )
Shown to reproduce results but faster! .bold[C++ (ROOT):] 10+ hours .bold[pyhf:] < 30 minutes .kol-3-5[ .center.width-100[] ] .kol-2-5[

.center.width-100[[![overlay_multiplex_contour](figures/overlay_multiplex_contour.png)](https://cds.cern.ch/record/2684863)] ]

Likelihood serialization and reproduction/reuse

Background-only model JSON stored
Hundreds of signal model JSON Patches stored together as a "patch set" file
Together are able to publish and fully preserve the full likelihood (with own DOI! .width-20[] )
First .bold[ever] full likelihood of an LHC experiment published in 2019
- ATLAS Run-2 search for bottom-squarks [JHEP12(2019)060]

.center[Solves technical problem of distribution and made good on a 19(!) year old agreement to publish likelihoods]

.center.width-95[ .center[(1st Workshop on Confidence Limits, CERN, 2000)] ]

Further speed up by parallelizing across cluster

Running across 25 worker nodes on the cloud
Background and signal patches are sent to workers on demand
- Possible as the signal patches don't need information from each other
- Embarrassingly parallelizable
Results being plotted as they are streamed back
Fit of same likelihood now takes .bold[3 minutes] for all signal points! ] .kol-2-3[

.center.width-70[] .center.small[(GIF sped up by 8x)] ]

Machine Learning Frameworks for Computation

All numerical operations implemented in .bold[tensor backends] through an API of $n$-dimensional array operations
Using deep learning frameworks as computational backends allows for .bold[exploitation of auto differentiation (autograd) and GPU acceleration]
As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)

.kol-1-2.center[ .width-90[] ] .kol-1-2[

Show hardware acceleration giving .bold[order of magnitude speedup] for some models!
Improvements over traditional
- 10 hrs to 30 min; 20 min to 10 sec ] ] .kol-1-4.center[ .width-85[] .width-85[] .width-85[]

Unified API through `tensorlib` shim

NumPy ] .kol-1-2.center[ .width-90[]
JAX ] ] .kol-1-1[ .kol-1-2.center[ .width-95[]
Tensorflow ] .kol-1-2.center[ .width-95[]
PyTorch ] ]

Allows for transparently changing backend

.center.width-100[]

Automatic differentiation

With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision

$$ \frac{\partial L}{\partial \mu}, \frac{\partial L}{\partial \theta_{i}} $$

.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]

.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)] ] .kol-1-2[ .center.width-80[] ] ]

Tensor backends offer a computational advantage

For visual comparison: the computational graph of the Higgs discovery analysis from the C++ framework. Image courtesy of Kyle Cranmer.

.center.width-100[![Higgs_HistFactory_graph](figures/Higgs_HistFactory_graph.png)]

Publications using `pyhf`

.kol-1-2.center.width-95[ .center.width-70[]

.center.width-40[] ] .kol-1-2.center.width-95[ .center.width-100[]

.center.width-100[] ]

Use in analysis outside of particle physics

Public data from Fermi Large Area Telescope (LAT) analyzed by L. Heinrich et al.
The LAT is a high-energy gamma-ray telescope — the gamma-ray photons come from extreme cosmological events
Can represent the photons counts in the LAT as a binned model
- Here full-sky map visualized with healpy's Mollweide projection
- Think: 2d histogram with special binning ] .kol-2-3[ .center.width-100[] ]

Summary

.large[.bold[Accelerated] fitting library]
- reducing time to insight/inference!
- Hardware acceleration on GPUs and vectorized operations
- Backend agnostic Python API and CLI
.large[Flexible .bold[declarative] schema]
- JSON: ubiquitous, universal support, versionable
.large[Enabling technology for .bold[reinterpretation]]
- JSON Patch files for efficient computation of new signal models
- Unifying tool for theoretical and experimental physicists
.large[Project in growing .bold[Pythonic HEP ecosystem]]
- c.f. Jim Pivarski and Henry Schreiner's talks in High Performance Python track
- Ask us about Scikit-HEP and IRIS-HEP! ] .kol-1-3[

.center.width-100[[![pyhf_logo](https://iris-hep.org/assets/logos/pyhf-logo.png)](https://github.com/scikit-hep/pyhf)] ]

Thanks for listening!

Come talk with us!

.large[www.scikit-hep.org/pyhf] ] .grid[ .kol-1-3.center[ .width-90[] ] .kol-1-3.center[
.width-90[] ] .kol-1-3.center[

.width-100[] ] ]

Backup

HistFactory Template (in more detail)

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

$$ \nu_{cb}(\vec{\eta}, \vec{\chi}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\vec{\eta}, \vec{\chi})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\vec{\eta}, \vec{\chi}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\vec{\eta}, \vec{\chi})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
.katex[Event rates] $\nu_{cb}$ from nominal rate $\nu_{scb}^{0}$ and rate modifiers $\kappa$ and $\Delta$
.red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)
$\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars

Why is the likelihood important?

.kol-1-2.width-90[

High information-density summary of analysis
Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
- Trigger
- Detector
- Systematic Uncertainties
- Event Selection
Unique representation of the analysis to preserve ] .kol-1-2.width-90[

]

Likelihood serialization and reproduction

ATLAS note on the JSON schema for serialization and reproduction of results [ATL-PHYS-PUB-2019-029]
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML

.right.width-80[ ]

Likelihood serialization and reproduction

ATLAS note on the JSON schema for serialization and reproduction of results [ATL-PHYS-PUB-2019-029]
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
  - Overlay of expected limit contours (hatching) and observed lines nice visualization of near perfect agreement
- Serialized likelihood and reproduced results of ATLAS Run-2 search for bottom-squarks [JHEP12(2019)060] and published to HEPData
- Shown to reproduce results but faster! .bold[C++ (ROOT):] 10+ hours .bold[pyhf:] < 30 minutes

.kol-1-2.center.width-95[ ] .kol-1-2.right.width-70[ ]

How are gradients computed when using NumPy?

.large[ As the NumPy backend with the SciPy optimizer doesn't support automatic differentiation, we make use of scipy.optimize.minimize along with the Sequential Least Squares Programming (SLSQP) method for the fit. ]

How much of an affect does automatic differentiation have on fit speed?

.large[ This is hard to answer rigorously. It depends on the model complexity and what the analysis is. For a single fit for a small to medium model the use of gradients might not have much effect. However, for larger models the speedups from automatic differentiation become more apparent, but may not matter as much as JIT compilation.

In general though, lager more complex models derive greater benefits from automatic differentiation and JIT compilation. ]

What SciPy optimizers are being used for MLE?

.large[ We use scipy.optimize.minimize along with the Sequential Least Squares Programming (SLSQP) method. We further leverage this through a custom AutoDiffOptimizerMixin class to feed the gradients from all the backends into scipy.optimize.minimize for performant optimization. In future versions (v0.5.0 onwards), this mixin will be dropped in favor of tensor-backend shims. ]

Is the NumPy backend competitive against C++?

.large[ Not for all models, but for some yes. If the model isn't too large and doesn't have a huge number of systematics, we can take advantage of the way the computations are laid out differently between pyhf and the C++ implementation and still be quite performant compared to the C++ thanks to vectorization. ]

Can I use `pyhf` if I'm not a physicist?

.large[ pyhf itself is focused exclusively on the HistFactory statistical model, but if you are performing counting experiments with template fits then maybe.

We hope that pyhf serves as an example of how models can be expressed in a declarative manner and implemented in a backend agnostic manner. ]

Is this frequentist only?

.kol-1-2.width-95[
.large[ Not exactly. The inference machinery that pyhf comes equipped with is frequentist focused, but the likelihood is common to both frequentist and Bayesian statistics. You could use the pyhf model in conjunction with priors to do Bayesian inference.

One of our "investigative" research areas is looking at using emcee and PyMC3 to do Bayesian inference with pyhf models. ] ] .kol-1-2.center.width-95[ ]

References

F. James, Y. Perrin, L. Lyons, .italic[Workshop on confidence limits: Proceedings], 2000.
ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.
A. Read, .italic[Modified frequentist analysis of search results (the $\mathrm{CL}_{s}$ method)], 2000.
K. Cranmer, .italic[CERN Latin-American School of High-Energy Physics: Statistics for Particle Physicists], 2013.
ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum], 2019
ATLAS collaboration, .italic[Reproducing searches for new physics with the ATLAS experiment through publication of full statistical likelihoods], 2019
ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum: HEPData entry], 2019

The end.

Files

talk.md

Latest commit

History

talk.md

File metadata and controls

pyhf

pyhf core dev team

We're high energy particle physicists

So we want to know

Goals of physics analysis at the LHC

HistFactory Model

HistFactory Template

HistFactory Template

Basic object of HistFactory is the statistical model

Model is represented as a computational graph

Core task: Maximum likelihood fits

Use in profile likelihood fits to ask

JSON spec fully describes the HistFactory model

JSON Patch for signal model (reinterpretation)

Likelihood serialization and reproduction/reuse

Likelihood serialization and reproduction/reuse

Further speed up by parallelizing across cluster

Machine Learning Frameworks for Computation

Unified API through tensorlib shim

Allows for transparently changing backend

Automatic differentiation

Tensor backends offer a computational advantage

Publications using pyhf

Use in analysis outside of particle physics

Summary

Thanks for listening!

Come talk with us!

HistFactory Template (in more detail)

Why is the likelihood important?

Likelihood serialization and reproduction

Likelihood serialization and reproduction

How are gradients computed when using NumPy?

How much of an affect does automatic differentiation have on fit speed?

What SciPy optimizers are being used for MLE?

Is the NumPy backend competitive against C++?

Can I use pyhf if I'm not a physicist?

Is this frequentist only?

References

`pyhf` core dev team

Unified API through `tensorlib` shim

Publications using `pyhf`

Can I use `pyhf` if I'm not a physicist?