class: middle, center, title-slide count: false
.huge.blue[Matthew Feickert]
.huge[(University of Illinois at Urbana-Champaign)]
.center.width-5[]
matthew.feickert@cern.ch
2020 IRIS-HEP Institute Retreat
May 27th, 2020
.grid[ .kol-1-3.center[ .circle.width-80[]
CERN ] .kol-1-3.center[ .circle.width-80[]
Illinois .center.bold.blue[IRIS-HEP] ] .kol-1-3.center[ .circle.width-75[]
UCSC SCIPP ] ]
.kol-1-1[
.kol-1-3.center[
.width-100[]
Search for new physics
]
.kol-1-3.center[
.width-100[]
Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]
Provide constraints on models through setting best limits ] ]
- All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
- Model complexity can be huge for complicated searches
- Problem: Time to fit can be .bold[many hours]
- .blue[Goal:] Empower analysts with fast fits and expressive models
- .large[Accelerating fitting (reducing time to .bold[insight] (statistical inference)!)]
- .large[Flexible schema great for open likelihood .bold[preservation]]
- .normal[Likelihood serves as high information-density summary of analysis]
- .large[An enabling technology for .bold[reinterpretation]]
class: middle
- Background-only model JSON stored
- Signal models stored as JSON Patch files
- Together are able to fully preserve the full model (with own DOI! .width-20[] )
- c.f. Matthew's CHEP 2019 talk, Lukas's LHCP 2020 talk
.kol-1-2.center.width-95[ .center.width-100[]
.center.width-100[] ] .kol-1-2.center.width-100[ .center.width-100[] ]
.kol-1-2[
- Impressive appetite for
pyhf
in ATLAS analyses - Much of SUSY,
$HH \to 4b$ limit setting- Giordon: SUSY Run-2 Summaries subconvener
- Lukas: ATLAS Modeling Group convener
- Upcoming: ATLAS Stats Forum recommendation ] .kol-1-2[
.italic.smaller[Thanks for making a tool super easy to use!
When I got some [Jupyter] notebooks with this code up and shared with students a lot more of us started including limits in our talks.
Before this was a pretty painful step!]
.center.smaller[— Nicole Hartman (SLAC), ATLAS Ph.D. Student]
]
.kol-1-1[
.kol-1-1[
.kol-1-2[
.center.width-70[]
]
.kol-1-2[
.center.width-70[]
]
]
.center.smaller[SUSY EWK 3L RPV analysis (ATLAS-CONF-2020-009): Exclusion curves as a function of mass and branching fraction to
.kol-2-5[
- SModelS team has implemented a
SModelS
/pyhf
interface- tool for interpreting simplified-model results from the LHC
- designed to be used by theorists
- Have produced comparison for .italic[Search for direct stau production in events with two hadronic tau leptons in √s = 13 TeV pp collisions with the ATLAS detector] (ATLAS-SUSY-2018-04) published likelihood
.italic.smaller[
So here is one of our first reasonable validation plots.
It's preliminary, the black line is ATLAS-SUSY-2018-04 official exclusion curve.
The grey line is SModelS using pyhf
, running over the published data. — Wolfgang Waltenberger, CMS/SModelS
]
]
.kol-1-2[
- In working to add
percentile
method across all backends (as part of toys inv0.5.0
) discovered discrepancy between NumPy implementation and TensorFlow Probability (TFP)- Through research between NumPy and TFP source code found a bug in TFP!
- Confirmed by dev team in discussion on GitHub Issue
- Agreed with dev team I would write a PR, which was reviewed and merged in timely manner
- Along with Henry and Jim, now have upstream contributions to open source .bold[directly originating] from IRIS-HEP work
pyhf
will .bold[need this bug fix] in the next TFP release, and .bold[thousands] of other projects will benefit- Bonus: Continuing goodwill development ] .kol-1-2[ .center.width-100[]
.center.width-100[[![tfp_PRs_merged](figures/tfp_PRs_merged.png)](https://github.com/tensorflow/probability/pulls?q=is%3Apr+author%3Amatthewfeickert+is%3Aclosed)]
.center.width-100[[![csuter_compliment](figures/csuter_compliment.png)](tensorflow/probability#912 (comment))] ]
class: middle
- Any analysis that wants to use
pyhf
for full Run 2 should be able to
.kol-1-2[ .bold[Requirements]:
pyhf
becomes mature in its feature set- Stat Config
- Non-asymptotic calculators (toys in
v0.5.0
) - Norm factor expressions
- Validation across all backends against
HistFactory
pyhf
GitHub org setup to help streamline process- Reproduction of published analyses on HEPData
- Documented examples
- Case studies
- Public knowledge base (
pyhf
Stack Overflow) - Rosetta stone (and what can't be done) between
ROOT
HistFactory
andpyhf
] .kol-1-2[ .center.width-100[] ]
- Preliminary results (old) show hardware acceleration giving .bold[order of magnitude speedup] for some models!
- Improvements over traditional
- 10 hrs to 30 min; 20 min to 10 sec
- Hardware acceleration benchmarking important to find edges ] .kol-1-2[ .center.width-60[] .center.width-60[] ]
.kol-1-3[
- Most obvious connections:
- ServiceX: direct data transform and delivery
- Illinois team dynamic between Ben and Matthew
cabinetry
: general interfacing to other tools- c.f. Alex's poster from 2020 Poster Session for more details ] .kol-2-3[ .center.width-90[] ]
- ServiceX: direct data transform and delivery
class: middle
pyhf
HistFactory model spec is pure JSON: Very natural to use a .blue[REST web API] for remote fitting!
.kol-1-3[
pyhf
installed on different clusters with GPUs around the world- User hits a REST API with JSON
pyhf
workspace as a request pyhf
fits the workspace on the cluster on demand- Returns fit results over REST API to user ] .kol-2-3[ .center.width-90[] ]
.kol-1-2[ - Growing number of analyses publishing full likelihoods to HEPData - At the moment each likelihood is the collection of many individual signal patch files - Introduce concept of "patchsets" to reduce all of this two two files: - Background only file - Signal patchset file - Would use `hepdata-validator` to resolve all files to inline JSON - Allows for entire likelihood to be natively supported in HEPData (no more tarballs required) ] .kol-1-2[ .center.width-100[[![HEPData_native_support](figures/HEPData_native_support.png)](HEPData/hepdata#164)]
.center.width-100[[![HEPData_patchset](figures/carbon_patchset.png)](HEPData/hepdata#164 (comment))] ]
class: middle
Following up on Kyle's presentation yesterday
.center[.bold[ServiceX to perform event selection and deliver histograms for .blue[`pyhf` model]]]
- Should be relatively easy to translate from ServiceX output to `pyhf` JSON model, but probably don't want to - Moving the translation from `pyhf` to `cabinetry` seems like a more robust solution - `cabinetry` has ability to be a powerful tool, but to `pyhf` translation is most interesting - ServiceX to `cabinetry`: data delivery - `cabinetry` to `pyhf`: constructing of likelihood - If useful, Matthew could join contribution efforts - Alex has pointed out this is even mostly doable now with `TRExFitter` - ServiceX feeding histograms to `TRExFitter` - Convert XML to JSON with [`pyhf xml2json`](https://scikit-hep.org/pyhf/cli.html#pyhf-xml2json) - Fit with `pyhf`
.center[ .bold[ Optimize analysis by using automatic differentiation to compute $d(\textrm{Expected limit})/d(\textrm{analysis parameters})$, which are back-propagated from from output of stats tool, .blue[through `pyhf` running in fitting service], back to ServiceX running at analysis facility, and through the event selection & histogramming code ] ]
- As already covered, fitting with `pyhf` can be scaled up on demand and run almost anywhere - Local machine, cluster, AWS - `pyhf` being built on frameworks that automatically handle gradients allows for this to happen naturally - Should get taken care of as a natural part of `pyhf` development
.kol-2-3[
- .bold[Accomplishments]
- Published and preserved full likelihoods
- Become hugely popular and adopted inside ATLAS
- Establishing connections for growth with SModelS and HEPData
- .bold[Year 3 Execution]
- Reach stable API and
v1.0.0
release - Provide analysis support
- Benchmark and profile hardware acceleration benefits
- Reach stable API and
- .bold[Vision for Year 4/5]
- Globally deployed and scalable "fitting as a service" using REST web API
- Have native support in HEPData for analysis preservation
- .bold[Grand Challenge]
class: end-slide, center
Backup
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.bold[Main pieces:]
- .blue[Main Poisson p.d.f. for bins observed in all channels]
- .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)
- ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
- L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.
class: end-slide, center count: false
The end.