Skip to content

Latest commit

 

History

History
654 lines (532 loc) · 25.7 KB

talk.md

File metadata and controls

654 lines (532 loc) · 25.7 KB

class: middle, center, title-slide count: false

pyhf

.large[a pure Python statistical fitting library with tensors and autograd]

.center.width-30[pyhf_logo]

.huge.blue[Matthew Feickert]

.width-02[email] matthew.feickert@cern.ch
.width-02[Twitter] @HEPfeickert
.width-02[GitHub] matthewfeickert

SciPy 2020
July 7th, 2020


pyhf core dev team


.grid[ .kol-1-3.center[ .circle.width-80[Lukas]

Lukas Heinrich

CERN ] .kol-1-3.center[ .circle.width-80[Matthew]

Matthew Feickert

University of Illinois
Urbana-Champaign ] .kol-1-3.center[ .circle.width-75[Giordon]

Giordon Stark

UCSC SCIPP ] ]


We're high energy particle physicists

.kol-1-2.center[
.width-95[LHC] LHC ] .kol-1-2.center[ .width-95[ATLAS_TRex] ATLAS ] .kol-1-1[ .kol-1-2.center[ .width-45[SM_mug] ] .kol-1-2.center[ .kol-1-2.center[ .width-100[ParticleZoo_Higgs] ] .kol-1-2.center[ .width-85[ParticleZoo_DarkMatter] ] ] ]


So we want to know


.center.width-100[[![CERN_ribbon_of_truth](figures/CERN_ribbon_of_truth.png)](https://home.cern/)]

Goals of physics analysis at the LHC

.kol-1-1[ .kol-1-3.center[ .width-100[ATLAS_Higgs_discovery] Search for new physics ] .kol-1-3.center[
.width-100[CMS-PAS-HIG-19-004]


Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]

Provide constraints on models through setting best limits ] ]

  • All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
  • Model complexity can be huge for complicated searches
  • Problem: Time to fit can be .bold[many hours]
  • .blue[Goal:] Empower analysts with fast fits and expressive models

HistFactory Model

  • A flexible probability density function (p.d.f.) template to build statistical models in high energy physics (HEP)
  • Developed during work that lead to the Higgs discovery in 2011 [CERN-OPEN-2012-016]
  • Widely used by the HEP community for .bold[measurements of known physics] (Standard Model) and
    .bold[searches for new physics] (beyond the Standard Model)

.kol-2-5.center[ .width-90[HIGG-2016-25] .bold[Standard Model] ] .kol-3-5.center[ .width-100[SUSY-2016-16] .bold[Beyond the Standard Model] ]


HistFactory Template

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.kol-1-2[ .bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
  • .katex[Event rates] $\nu_{cb}$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
    • encode systematic uncertainties (e.g. normalization, shape)
  • $\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars ] .kol-1-2[ .center.width-100[SUSY-2016-16_annotated] .center[Example: .bold[Each bin] is separate (1-bin) channel,
    each .bold[histogram] (color) is a sample and share
    a .bold[normalization systematic] uncertainty] ]

HistFactory Template

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

Mathematical grammar for a simultaneous fit with

  • .blue[multiple "channels"] (analysis regions, (stacks of) histograms)
  • each region can have .blue[multiple bins]
  • coupled to a set of .red[constraint terms]

.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined] .center[.bold[Until now] (2018), the only implementation of HistFactory has been in a monolithic C++ library used in HEP (ROOT)]

.bold[pyhf: HistFactory in pure Python] .center.width-70[pyhf_PyPI]


Basic object of HistFactory is the statistical model

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right) \prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right) $$

.center[care about log likelihood as using maximum likelihood fits]

$$ \ln L \left(\vec{\theta} \,\middle| \vec{x}\right) \Rightarrow \texttt{model.logpdf(parameters, data)} $$

.center.width-100[carbon_logpdf_example.png]


Model is represented as a computational graph

.grid[ .kol-1-3[

  • Each node of the graph represents a vectorized
    $n$-dimensional array ("tensorized") operation
  • The graph (model) is largely factorized between the .pars_blue[parameter] graph and the
    .data_green[data] graph
  • The bottom node is then used for final log likelihood .value_orange[value]


$$ \texttt{\textcolor{#ddc16c}{value} = model.logpdf(\textcolor{#73bbe6}{parameters}, \textcolor{#a5dc92}{data})} $$ ] .kol-2-3[ .center.width-90[![DAG](figures/computational_graph.png)] ] ]

Core task: Maximum likelihood fits

$$ \texttt{pyhf.infer.mle.fit(data, model)} $$

.center[minimizes the objective function $-2\ln L \left(\vec{\theta} \,\middle| \vec{x}\right)$ (maximizing the likelihood) with the backend's optimizer]

.center.width-100[carbon_mle_fit_example]


Use in profile likelihood fits to ask

.center.bold.blue["Given our model and the data, did we discover new physics?"] $$ -2\ln \Lambda (\mu) = - 2\ln\frac{L(\mu, \hat{\!\hat{\theta}})}{L(\hat{\mu}, \hat{\theta})} \quad \frac{\Leftarrow\textrm{constrained best fit}}{\Leftarrow\textrm{unconstrained best fit}} $$

.center[compute (.bold[fast as possible!]) modified $p$-value (the $\mathrm{CL}_{s}$) for a given parameter of interest $\mu$ — hypothesis testing!] $$ \texttt{pyhf.infer.hypotest(testpoi, data, model)} $$

.center.width-65[carbon_hypotest_example]


JSON spec fully describes the HistFactory model

.kol-1-4.width-100[

  • Human & machine readable .bold[declarative] statistical models
  • Industry standard
    • Will be with us forever
  • Parsable by every language
    • Highly portable
    • No lock in
  • Versionable and easily preserved

.center[JSON defining a single channel, two bin counting experiment with systematics] ]


JSON Patch for signal model (reinterpretation)

.center[JSON Patch gives ability to .bold[easily mutate model]]
.center[Think: test a .bold[new theory] with a .bold[new patch]!]

.kol-1-5[



.center.width-100[measurement_cartoon] .center[Signal model A] ] .kol-3-5[

.center.width-100[signal_reinterpretation] ] .kol-1-5[



.center.width-100[reinterpretation_cartoon] .center[Signal model B] ]


Likelihood serialization and reproduction/reuse

  • Background-only model JSON stored
  • Hundreds of signal model JSON Patches stored together as a "patch set" file
  • Together are able to publish and fully preserve the full likelihood (with own DOI! .width-20[DOI] )
  • Shown to reproduce results but faster! .bold[C++ (ROOT):] 10+ hours .bold[pyhf:] < 30 minutes .kol-3-5[ .center.width-100[HEPData_likelihoods] ] .kol-2-5[

.center.width-100[[![overlay_multiplex_contour](figures/overlay_multiplex_contour.png)](https://cds.cern.ch/record/2684863)] ]

Likelihood serialization and reproduction/reuse

  • Background-only model JSON stored
  • Hundreds of signal model JSON Patches stored together as a "patch set" file
  • Together are able to publish and fully preserve the full likelihood (with own DOI! .width-20[DOI] )
  • First .bold[ever] full likelihood of an LHC experiment published in 2019

.center[Solves technical problem of distribution and made good on a 19(!) year old agreement to publish likelihoods]

.center.width-95[ likelihood_publishing_agreement .center[(1st Workshop on Confidence Limits, CERN, 2000)] ]


Further speed up by parallelizing across cluster

.kol-1-3[

  • Running across 25 worker nodes on the cloud
  • Background and signal patches are sent to workers on demand
    • Possible as the signal patches don't need information from each other
    • Embarrassingly parallelizable
  • Results being plotted as they are streamed back
  • Fit of same likelihood now takes .bold[3 minutes] for all signal points! ] .kol-2-3[

.center.width-70[plot_countour] .center.small[(GIF sped up by 8x)] ]


Machine Learning Frameworks for Computation

.grid[ .kol-2-3[

  • All numerical operations implemented in .bold[tensor backends] through an API of $n$-dimensional array operations
  • Using deep learning frameworks as computational backends allows for .bold[exploitation of auto differentiation (autograd) and GPU acceleration]
  • As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)

.kol-1-2.center[ .width-90[scaling_hardware] ] .kol-1-2[

  • Show hardware acceleration giving .bold[order of magnitude speedup] for some models!
  • Improvements over traditional
    • 10 hrs to 30 min; 20 min to 10 sec ] ] .kol-1-4.center[ .width-85[NumPy] .width-85[PyTorch] .width-85[Tensorflow]

.width-50[![JAX](figures/logos/JAX_logo.png)] ] ]

Unified API through tensorlib shim

.kol-1-1[ .kol-1-2.center[ .width-90[carbon_normal_dist_numpy]

NumPy ] .kol-1-2.center[ .width-90[carbon_normal_dist_jax]
JAX ] ] .kol-1-1[ .kol-1-2.center[ .width-95[carbon_normal_dist_tensorflow]
Tensorflow ] .kol-1-2.center[ .width-95[carbon_normal_dist_pytorch]
PyTorch ] ]


Allows for transparently changing backend

.center.width-100[carbon_change_backend]


Automatic differentiation

With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision

$$ \frac{\partial L}{\partial \mu}, \frac{\partial L}{\partial \theta_{i}} $$

.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]



.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)] ] .kol-1-2[ .center.width-80[DAG] ] ]


Tensor backends offer a computational advantage

For visual comparison: the computational graph of the Higgs discovery analysis from the C++ framework. Image courtesy of Kyle Cranmer.


.center.width-100[![Higgs_HistFactory_graph](figures/Higgs_HistFactory_graph.png)]

Publications using pyhf

.kol-1-2.center.width-95[ .center.width-70[arxViv_header]

.center.width-40[arxViv_tweet_GIF] ] .kol-1-2.center.width-95[ .center.width-100[ATLAS_PUB_Note_title]

.center.width-100[CERN_news_story] ]


Use in analysis outside of particle physics

.kol-1-3[

  • Public data from Fermi Large Area Telescope (LAT) analyzed by L. Heinrich et al.
  • The LAT is a high-energy gamma-ray telescope — the gamma-ray photons come from extreme cosmological events
  • Can represent the photons counts in the LAT as a binned model
    • Here full-sky map visualized with healpy's Mollweide projection
    • Think: 2d histogram with special binning ] .kol-2-3[ .center.width-100[Fermi_LAT] ]

Summary

.kol-2-3[ .large[pyhf provides:]

  • .large[.bold[Accelerated] fitting library]
    • reducing time to insight/inference!
    • Hardware acceleration on GPUs and vectorized operations
    • Backend agnostic Python API and CLI
  • .large[Flexible .bold[declarative] schema]
    • JSON: ubiquitous, universal support, versionable
  • .large[Enabling technology for .bold[reinterpretation]]
    • JSON Patch files for efficient computation of new signal models
    • Unifying tool for theoretical and experimental physicists
  • .large[Project in growing .bold[Pythonic HEP ecosystem]]
    • c.f. Jim Pivarski and Henry Schreiner's talks in High Performance Python track
    • Ask us about Scikit-HEP and IRIS-HEP! ] .kol-1-3[



.center.width-100[[![pyhf_logo](https://iris-hep.org/assets/logos/pyhf-logo.png)](https://github.com/scikit-hep/pyhf)] ]

class: middle

.center[

Thanks for listening!

Come talk with us!

.large[www.scikit-hep.org/pyhf] ] .grid[ .kol-1-3.center[ .width-90[scikit-hep_logo] ] .kol-1-3.center[
.width-90[pyhf_logo] ] .kol-1-3.center[

.width-100[iris-hep_logo] ] ]


class: end-slide, center

Backup


HistFactory Template (in more detail)

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

$$ \nu_{cb}(\vec{\eta}, \vec{\chi}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\vec{\eta}, \vec{\chi})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\vec{\eta}, \vec{\chi}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\vec{\eta}, \vec{\chi})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
  • .katex[Event rates] $\nu_{cb}$ from nominal rate $\nu_{scb}^{0}$ and rate modifiers $\kappa$ and $\Delta$
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
    • encoding systematic uncertainties (normalization, shape, etc)
  • $\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars

Why is the likelihood important?


.kol-1-2.width-90[

  • High information-density summary of analysis
  • Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
    • Trigger
    • Detector
    • Systematic Uncertainties
    • Event Selection
  • Unique representation of the analysis to preserve ] .kol-1-2.width-90[


    likelihood_connections ]

Likelihood serialization and reproduction

  • ATLAS note on the JSON schema for serialization and reproduction of results [ATL-PHYS-PUB-2019-029]
    • Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML

.right.width-80[ flowchart ]


Likelihood serialization and reproduction

  • ATLAS note on the JSON schema for serialization and reproduction of results [ATL-PHYS-PUB-2019-029]
    • Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
      • Overlay of expected limit contours (hatching) and observed lines nice visualization of near perfect agreement
    • Serialized likelihood and reproduced results of ATLAS Run-2 search for bottom-squarks [JHEP12(2019)060] and published to HEPData
    • Shown to reproduce results but faster! .bold[C++ (ROOT):] 10+ hours .bold[pyhf:] < 30 minutes

.kol-1-2.center.width-95[ overlay_multiplex_contour ] .kol-1-2.right.width-70[ discrepancy ]


How are gradients computed when using NumPy?




.large[ As the NumPy backend with the SciPy optimizer doesn't support automatic differentiation, we make use of scipy.optimize.minimize along with the Sequential Least Squares Programming (SLSQP) method for the fit. ]


How much of an affect does automatic differentiation have on fit speed?


.large[ This is hard to answer rigorously. It depends on the model complexity and what the analysis is. For a single fit for a small to medium model the use of gradients might not have much effect. However, for larger models the speedups from automatic differentiation become more apparent, but may not matter as much as JIT compilation.

In general though, lager more complex models derive greater benefits from automatic differentiation and JIT compilation. ]


What SciPy optimizers are being used for MLE?



.large[ We use scipy.optimize.minimize along with the Sequential Least Squares Programming (SLSQP) method. We further leverage this through a custom AutoDiffOptimizerMixin class to feed the gradients from all the backends into scipy.optimize.minimize for performant optimization. In future versions (v0.5.0 onwards), this mixin will be dropped in favor of tensor-backend shims. ]


Is the NumPy backend competitive against C++?




.large[ Not for all models, but for some yes. If the model isn't too large and doesn't have a huge number of systematics, we can take advantage of the way the computations are laid out differently between pyhf and the C++ implementation and still be quite performant compared to the C++ thanks to vectorization. ]


Can I use pyhf if I'm not a physicist?



.large[ pyhf itself is focused exclusively on the HistFactory statistical model, but if you are performing counting experiments with template fits then maybe.

We hope that pyhf serves as an example of how models can be expressed in a declarative manner and implemented in a backend agnostic manner. ]


Is this frequentist only?

.kol-1-2.width-95[
.large[ Not exactly. The inference machinery that pyhf comes equipped with is frequentist focused, but the likelihood is common to both frequentist and Bayesian statistics. You could use the pyhf model in conjunction with priors to do Bayesian inference.

One of our "investigative" research areas is looking at using emcee and PyMC3 to do Bayesian inference with pyhf models. ] ] .kol-1-2.center.width-95[ Lukas_emcee_tweet ]


References

  1. F. James, Y. Perrin, L. Lyons, .italic[Workshop on confidence limits: Proceedings], 2000.
  2. ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
  3. L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.
  4. A. Read, .italic[Modified frequentist analysis of search results (the $\mathrm{CL}_{s}$ method)], 2000.
  5. K. Cranmer, .italic[CERN Latin-American School of High-Energy Physics: Statistics for Particle Physicists], 2013.
  6. ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum], 2019
  7. ATLAS collaboration, .italic[Reproducing searches for new physics with the ATLAS experiment through publication of full statistical likelihoods], 2019
  8. ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum: HEPData entry], 2019

class: end-slide, center count: false

The end.