class: middle, center, title-slide count: false
.large[a pure Python statistical fitting library with tensors and autograd]
.center.width-30[]
.huge.blue[Matthew Feickert]
.width-02[] matthew.feickert@cern.ch
.width-02[] @HEPfeickert
.width-02[] matthewfeickert
SciPy 2020
July 7th, 2020
.grid[ .kol-1-3.center[ .circle.width-80[]
CERN ] .kol-1-3.center[ .circle.width-80[]
University of Illinois
Urbana-Champaign
]
.kol-1-3.center[
.circle.width-75[]
UCSC SCIPP ] ]
.kol-1-2.center[
.width-95[]
LHC
]
.kol-1-2.center[
.width-95[]
ATLAS
]
.kol-1-1[
.kol-1-2.center[
.width-45[]
]
.kol-1-2.center[
.kol-1-2.center[
.width-100[]
]
.kol-1-2.center[
.width-85[]
]
]
]
.center.width-100[[![CERN_ribbon_of_truth](figures/CERN_ribbon_of_truth.png)](https://home.cern/)]
.kol-1-1[
.kol-1-3.center[
.width-100[]
Search for new physics
]
.kol-1-3.center[
.width-100[]
Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]
Provide constraints on models through setting best limits ] ]
- All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
- Model complexity can be huge for complicated searches
- Problem: Time to fit can be .bold[many hours]
- .blue[Goal:] Empower analysts with fast fits and expressive models
- A flexible probability density function (p.d.f.) template to build statistical models in high energy physics (HEP)
- Developed during work that lead to the Higgs discovery in 2011 [CERN-OPEN-2012-016]
- Widely used by the HEP community for .bold[measurements of known physics] (Standard Model) and
.bold[searches for new physics] (beyond the Standard Model)
.kol-2-5.center[ .width-90[] .bold[Standard Model] ] .kol-3-5.center[ .width-100[] .bold[Beyond the Standard Model] ]
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.kol-1-2[ .bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ (nominal rate$\nu_{scb}^{0}$ with rate modifiers) - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encode systematic uncertainties (e.g. normalization, shape)
-
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars ] .kol-1-2[ .center.width-100[] .center[Example: .bold[Each bin] is separate (1-bin) channel,
each .bold[histogram] (color) is a sample and share
a .bold[normalization systematic] uncertainty] ]
Mathematical grammar for a simultaneous fit with
- .blue[multiple "channels"] (analysis regions, (stacks of) histograms)
- each region can have .blue[multiple bins]
- coupled to a set of .red[constraint terms]
.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined]
.center[.bold[Until now] (2018), the only implementation of HistFactory has been in a monolithic C++
library used in HEP (ROOT
)]
.bold[pyhf
: HistFactory in pure Python]
.center.width-70[]
.center[care about log likelihood as using maximum likelihood fits]
.grid[
.kol-1-3[
- Each node of the graph represents a vectorized
$n$ -dimensional array ("tensorized") operation - The graph (model) is largely factorized between the .pars_blue[parameter] graph and the
.data_green[data] graph - The bottom node is then used for final log likelihood .value_orange[value]
$$ \texttt{\textcolor{#ddc16c}{value} = model.logpdf(\textcolor{#73bbe6}{parameters}, \textcolor{#a5dc92}{data})} $$ ] .kol-2-3[ .center.width-90[![DAG](figures/computational_graph.png)] ] ]
.center[minimizes the objective function
.center.bold.blue["Given our model and the data, did we discover new physics?"] $$ -2\ln \Lambda (\mu) = - 2\ln\frac{L(\mu, \hat{\!\hat{\theta}})}{L(\hat{\mu}, \hat{\theta})} \quad \frac{\Leftarrow\textrm{constrained best fit}}{\Leftarrow\textrm{unconstrained best fit}} $$
.center[compute (.bold[fast as possible!]) modified
.kol-1-4.width-100[
- Human & machine readable .bold[declarative] statistical models
- Industry standard
- Will be with us forever
- Parsable by every language
- Highly portable
- No lock in
- Versionable and easily preserved
- JSON Schema describing
HistFactory specification - Attractive for analysis preservation
- Highly compressible ] .kol-3-4.center[ .width-105[]
- JSON Schema describing
.center[JSON
defining a single channel, two bin counting experiment with systematics]
]
.center[JSON Patch gives ability to .bold[easily mutate model]]
.center[Think: test a .bold[new theory] with a .bold[new patch]!]
.kol-1-5[
.center.width-100[]
.center[Signal model A]
]
.kol-3-5[
.center.width-100[]
]
.kol-1-5[
.center.width-100[]
.center[Signal model B]
]
- Background-only model JSON stored
- Hundreds of signal model JSON Patches stored together as a "patch set" file
- Together are able to publish and fully preserve the full likelihood (with own DOI! .width-20[] )
- Shown to reproduce results but faster! .bold[C++ (ROOT):] 10+ hours .bold[pyhf:] < 30 minutes .kol-3-5[ .center.width-100[] ] .kol-2-5[
.center.width-100[[![overlay_multiplex_contour](figures/overlay_multiplex_contour.png)](https://cds.cern.ch/record/2684863)] ]
- Background-only model JSON stored
- Hundreds of signal model JSON Patches stored together as a "patch set" file
- Together are able to publish and fully preserve the full likelihood (with own DOI! .width-20[] )
- First .bold[ever] full likelihood of an LHC experiment published in 2019
- ATLAS Run-2 search for bottom-squarks [JHEP12(2019)060]
.center[Solves technical problem of distribution and made good on a 19(!) year old agreement to publish likelihoods]
.center.width-95[ .center[(1st Workshop on Confidence Limits, CERN, 2000)] ]
.kol-1-3[
- Running across 25 worker nodes on the cloud
- Background and signal patches are sent to workers on demand
- Possible as the signal patches don't need information from each other
- Embarrassingly parallelizable
- Results being plotted as they are streamed back
- Fit of same likelihood now takes .bold[3 minutes] for all signal points! ] .kol-2-3[
.center.width-70[] .center.small[(GIF sped up by 8x)] ]
.grid[ .kol-2-3[
- All numerical operations implemented in .bold[tensor backends] through an API of
$n$ -dimensional array operations - Using deep learning frameworks as computational backends allows for .bold[exploitation of auto differentiation (autograd) and GPU acceleration]
- As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)
.kol-1-2.center[
.width-90[]
]
.kol-1-2[
- Show hardware acceleration giving .bold[order of magnitude speedup] for some models!
- Improvements over traditional
.width-50[![JAX](figures/logos/JAX_logo.png)] ] ]
.kol-1-1[ .kol-1-2.center[ .width-90[]
NumPy
]
.kol-1-2.center[
.width-90[]
JAX
]
]
.kol-1-1[
.kol-1-2.center[
.width-95[]
Tensorflow
]
.kol-1-2.center[
.width-95[]
PyTorch
]
]
With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision
.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]
.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)]
]
.kol-1-2[
.center.width-80[]
]
]
For visual comparison: the computational graph of the Higgs discovery analysis from the C++
framework. Image courtesy of Kyle Cranmer.
.center.width-100[![Higgs_HistFactory_graph](figures/Higgs_HistFactory_graph.png)]
.kol-1-2.center.width-95[ .center.width-70[]
.center.width-40[] ] .kol-1-2.center.width-95[ .center.width-100[]
.kol-1-3[
- Public data from Fermi Large Area Telescope (LAT) analyzed by L. Heinrich et al.
- The LAT is a high-energy gamma-ray telescope — the gamma-ray photons come from extreme cosmological events
- Can represent the photons counts in the LAT as a binned model
- Here full-sky map visualized with
healpy
's Mollweide projection - Think: 2d histogram with special binning ] .kol-2-3[ .center.width-100[] ]
- Here full-sky map visualized with
.kol-2-3[
.large[pyhf
provides:]
- .large[.bold[Accelerated] fitting library]
- reducing time to insight/inference!
- Hardware acceleration on GPUs and vectorized operations
- Backend agnostic Python API and CLI
- .large[Flexible .bold[declarative] schema]
- JSON: ubiquitous, universal support, versionable
- .large[Enabling technology for .bold[reinterpretation]]
- JSON Patch files for efficient computation of new signal models
- Unifying tool for theoretical and experimental physicists
- .large[Project in growing .bold[Pythonic HEP ecosystem]]
- c.f. Jim Pivarski and Henry Schreiner's talks in High Performance Python track
- Ask us about Scikit-HEP and IRIS-HEP! ] .kol-1-3[
.center.width-100[[![pyhf_logo](https://iris-hep.org/assets/logos/pyhf-logo.png)](https://github.com/scikit-hep/pyhf)] ]
class: middle
.center[
.large[www.scikit-hep.org/pyhf]
]
.grid[
.kol-1-3.center[
.width-90[]
]
.kol-1-3.center[
.width-90[]
]
.kol-1-3.center[
.width-100[]
]
]
class: end-slide, center
Backup
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ from nominal rate$\nu_{scb}^{0}$ and rate modifiers$\kappa$ and$\Delta$ - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)
-
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars
.kol-1-2.width-90[
- High information-density summary of analysis
- Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
- Trigger
- Detector
- Systematic Uncertainties
- Event Selection
- Unique representation of the analysis to preserve
]
.kol-1-2.width-90[
]
- ATLAS note on the JSON schema for serialization and reproduction of results [ATL-PHYS-PUB-2019-029]
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
- ATLAS note on the JSON schema for serialization and reproduction of results [ATL-PHYS-PUB-2019-029]
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
- Overlay of expected limit contours (hatching) and observed lines nice visualization of near perfect agreement
- Serialized likelihood and reproduced results of ATLAS Run-2 search for bottom-squarks [JHEP12(2019)060] and published to HEPData
- Shown to reproduce results but faster! .bold[C++ (ROOT):] 10+ hours .bold[pyhf:] < 30 minutes
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
.kol-1-2.center.width-95[ ] .kol-1-2.right.width-70[ ]
.large[
As the NumPy backend with the SciPy optimizer doesn't support automatic differentiation, we make use of scipy.optimize.minimize
along with the Sequential Least Squares Programming (SLSQP) method for the fit.
]
.large[ This is hard to answer rigorously. It depends on the model complexity and what the analysis is. For a single fit for a small to medium model the use of gradients might not have much effect. However, for larger models the speedups from automatic differentiation become more apparent, but may not matter as much as JIT compilation.
In general though, lager more complex models derive greater benefits from automatic differentiation and JIT compilation. ]
.large[
We use scipy.optimize.minimize
along with the Sequential Least Squares Programming (SLSQP) method.
We further leverage this through a custom AutoDiffOptimizerMixin
class to feed the gradients from all the backends into scipy.optimize.minimize
for performant optimization.
In future versions (v0.5.0
onwards), this mixin will be dropped in favor of tensor-backend shims.
]
.large[
Not for all models, but for some yes.
If the model isn't too large and doesn't have a huge number of systematics, we can take advantage of the way the computations are laid out differently between pyhf
and the C++ implementation and still be quite performant compared to the C++ thanks to vectorization.
]
.large[
pyhf
itself is focused exclusively on the HistFactory statistical model, but if you are performing counting experiments with template fits then maybe.
We hope that pyhf
serves as an example of how models can be expressed in a declarative manner and implemented in a backend agnostic manner.
]
.kol-1-2.width-95[
.large[
Not exactly.
The inference machinery that pyhf
comes equipped with is frequentist focused, but the likelihood is common to both frequentist and Bayesian statistics.
You could use the pyhf
model in conjunction with priors to do Bayesian inference.
One of our "investigative" research areas is looking at using emcee and PyMC3 to do Bayesian inference with pyhf
models.
]
]
.kol-1-2.center.width-95[
]
- F. James, Y. Perrin, L. Lyons, .italic[Workshop on confidence limits: Proceedings], 2000.
- ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
- L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.
- A. Read, .italic[Modified frequentist analysis of search results (the $\mathrm{CL}_{s}$ method)], 2000.
- K. Cranmer, .italic[CERN Latin-American School of High-Energy Physics: Statistics for Particle Physicists], 2013.
- ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum], 2019
- ATLAS collaboration, .italic[Reproducing searches for new physics with the ATLAS experiment through publication of full statistical likelihoods], 2019
- ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum: HEPData entry], 2019
class: end-slide, center count: false
The end.