Skip to content

Latest commit

 

History

History
555 lines (466 loc) · 36.6 KB

CHANGELOG.md

File metadata and controls

555 lines (466 loc) · 36.6 KB

OpenDP Changelog

This file documents the version history of OpenDP. The links on each version number will take you to a comparison showing the source changes from the previous version.

0.10.0

Added

  • Polars:
    • Polars: add make_private_quantile_expr #908
    • Polars: bounded-DP mean via postprocessing #890
    • Polars: add make_expr_laplace #829
    • Polars: add make_expr_sum #819
    • Polars: add make_expr_clip #868
    • Polars: add make_private_aggregate #847
    • Polars: initial LazyFrame and Expr parsers #1454
    • Polars: add ExprDomain #795
    • Polars: lazyframe_domain ffi #769
    • Polars: series_domain ffi #767
    • Polars: add FrameDomain #765
    • Polars: add SeriesDomain #763
    • Polars: add make_expr_col #797
  • Usability:
    • Steer users in the right direction if they try to call a domain descriptor #1512
    • Warn if large priv loss #1457
    • xfail usability tests #1465
    • Measure __str__ -> __repr__ #1401
    • Runtime error if non-function is passed to new_function #1355
    • Error if missing arg param on measurement in R #1559
    • Specialized message for mismatched domain #1511
  • Python linting and typing:
    • Call mypy and flake8 as subprocesses from pytest #1359
    • More Python typing on context module #1472
    • Require explicit imports #1220
    • Use isinstance where appropriate #1221
    • Fix unneeded f-strings #1217
    • Leave only the ignores that we actually need #1219
    • No more bare except #1303
    • Fix masked mypy errors #1265
    • Fix the flake8 warnings that really need it #1261
    • Remove Any from generated python #1507
    • Python typing: float implies int #1486
    • Fix return signature on loss_of #1524
  • Sphinx docs and examples:
    • Grouping columns example #1508
    • Python measurement examples #1550
    • Use import opendp.prelude as dp in docs #1442
    • Add link to Reference page #1297
    • Python and R examples in tabs on quickstart (just CI) #1262
    • link to language specific docs #1516
    • Python example in API docs proposal #1439
    • links between user guide and api reference #1458
    • Update introductory paragraph, remove last outdated section #1446
    • API ToC just down to modules #1447
    • Enhance context docs #1386
    • parallel directory for ancillary doc files #1371
    • use sphinx-design; avoid raw html #1351
    • List production applications #1352
    • Update 404 template #1354
    • Documentation reorg #1177
    • Use appropriate shell syntax in notebook examples #1406
    • In Python API docs, use the same examples for make and then #1576 #1575
  • R docs and linting:
    • R examples in docs via stand-alone files #1494
    • Fix r-doc comment with missing closing tag #1493
    • Add R example to "typical workflow" #1466
    • One measurement example for R #1557
    • R favicons #1298
    • A concept on every R function, so the page is better organized #1299
    • R doc README #1247
    • Tidy up R docs header #1246
    • Make the dependency on r-docs explicit... at the cost of slowing down the docs build #1248
    • Subheadings in R docs #1245
    • Do not generate NEWS.md #1302
    • R linting #1344 #1408

Changed

  • Renaming:
    • In a handful of locations, change "udf" to "plugin" #1528
    • rename TotalOrd trait to ProductOrd #1362
  • Mechanisms:
    • Implement partition distance #1167
    • Refactor output perturbation mechanisms #1318
    • Break apart bernoulli sampler traits, add constant-time impl #1325
  • Developer docs and comments:
    • For Rust example in getting-started docs, use normal cargo run rather than trying to run as script with nightly #1612
    • Devs should install all optional dependencies #1522
    • Explain docs build in each target language #1499
    • Update LICENSE #1455
    • Update and fix typos on dev instructions #1231
    • Explain relationship between bindings and derive #1229
    • Consolidate tools requirements #1444
    • Add a badge for docs.rs #1435
    • Explain extra installs for R / Add "is" on homepage #1448
    • Explain that R install does not require pre-compiled code #1423
    • Just add a note to explain duplication #1484
  • CI and utilities:
    • util script for RST to NB #1483
    • replace list of "rm" with "git clean" #1301
    • manylinux2014 -> manylinux_2_24 #1268
    • Now if "stable" and "dry-run" are selected, will append "-dev" #1267
    • Upgrade github actions #1378
    • Minimal cargo test #1356
    • Add number_of_spaces param to indent #1368
    • LaTeX cache, temp, and output files #1361
    • Speed up smoke-test, mostly by no longer freeing disk space #1579
    • Package Python with cibuildwheel and setuptools-rust #1519
    • Move Rust tests to standalone files #1533 #1548
    • For consistency and simplicity, use --all-features #1526
      • Follow-up with separate builds in smoke-test to fix R/Polars CI #1515
  • Minimum Python version:
    • Use features of Python 3.9 #1558
    • 3.8 -> 3.12 in CI (except for smoke-test) #1398

Removed

  • Remove build_tool.py #1300
  • Remove sphinx doctest tags #1450
  • Remove dead_code markers that do not cause warnings in IDE #1380
  • Remove putting-it-together.rst, and move its diagram #1353
  • Remove @versioned from generated code #1263
  • Fix Rust build warning by removing reference to poly #1226

Fixed

  • CI
    • Cancel old smoke-test CI runs #1566
    • In CI, reverse ternary, so it is not confused by false-y empty string #1249
    • Fix nightly docs build #1464
    • Remove check from nightly, so it only calls release #1417
    • Fix nightly by checking inputs.fake on each #1350
    • Fix weekly-doc-check #1429
  • Python
    • Regenerate python code to include example #1615
    • Add get_np_csprng wrapper function, so we can remove the last skipif #1562
    • Replace datetime.now() with constant: Previously, tests would only pass within a certain date range #1561
    • Fix split_by_weights #1456
    • Address numpy test failures #1348
    • Add setuptools to requirements #1415
    • Add setuptools, fix nightly? #1427
    • Fix subcontext metric space #1443
  • Rust
    • For generate_header def and use, change feature from ffi to bindings #1434
    • Avoid panic in ALP histogram #1240
  • R
    • Resolve R warning when calling Rf_error in C #1536
    • R-docs artifact: We were uploading r-docs with v2, downloading with v4 #1480
    • delete duplicate def of parse_or_infer #1475
    • Fix erratic R linting errors #1402
    • Generated changes to R conf #1252
    • Error on warning from devtools::check #1253

0.9.2-dev - TBD

Fixed

  • Ignore nitpicky Sphinx warnings on old library versions #1218

0.9.1 - 2024-02-07

Fixed

  • Fix CI for GitHub release #1215

0.9.0 - 2024-02-07

Added

  • R language bindings #679
    • All library functionality is available, except for defining your own library primitives in R code
  • New transformations/measurements
    • DP PCA #1045
    • Exponential mechanism via make_report_noisy_max_gumbel #704
    • Quantile scoring transformation make_quantile_score_candidates #702
    • make_alp_queryable may now be used from Python #747
    • All compositors now allow concurrent composition of interactive measurements #958
  • Expanded functionality of user-defined library primitives
    • Define your own domains, metrics and measures from Python #871 #873
    • Domains may carry arbitrary descriptors #1044
    • Construct your own queryables from Python #870
  • Proofs from Vicki Xu, Hanwen Zhang, Zachary Ratliff and Michael Shoemate
    • make_randomized_response_bool #490
    • SampleBernoulli #496
    • make_is_equal #514
    • SampleUniformIntBelow #1183
  • The OpenDP Python package now supports PEP 561 type information #738
  • The OpenDP Rust crate is now thread-safe #874
  • Documentation, Typing and CI improvements from Chuck McCallum
    • CI: MyPy type-checking, link-checking in docs, code coverage, Rust formatting
    • Rust stack traces are now hidden by default #1138
  • FFI module in Rust is now public, allowing you to write your own lightweight FFI #1150
  • C dependencies on GMP/MPFR have been replaced with dashu #1141
    • The OpenDP Rust library can now be built easily on Windows and is a much more lightweight Rust dependency

Changed

  • TO argument on user-defined measurements is now optional #1147
  • raw functions can now be chained as postprocessors onto measurements

Fixed

  • Imports in the Python context module no longer pollute the prelude #1187

0.8.0 - 2023-08-11

Added

  • Partial constructors: each make_* constructor now has a then_* variant #689 #761
    • all make_* have gained two leading arguments: input_domain and input_metric
    • all then_* have same arguments as make_*, sans input_domain and input_metric
      • when chaining, then_* tunes to the previous transformation/metric space
    • to migrate, replace make_* with then_*, and then remove redundant arguments
    • #687 #690 #692 #712 #713 #798 #799 #802 #803 #804 #808 #810 #813 #815 #816
  • (preview) Context API for Python, giving a more succinct alternative to >> #750
    • context.query().clamp(bounds).sum().laplace().release()
    • automatically tunes a free parameter (like the scale) to satisfy privacy-loss bound
    • mediates queries to the interactive compositor/dataset inside context
    • #749
  • Support for aarch64 architecture on Linux #843
  • Nightly builds can now be downloaded from PyPi: pip install opendp --pre #879 #880
  • Proofs for make_row_by_row #688, make_clamp #512
  • Transformations throughout library support any valid combination of domain descriptors
    • for example, all data preprocessors now also work under bounded DP

Changed

  • Changed constructor names:
    • make_base_laplace, make_base_discrete_laplace -> make_laplace #736
    • make_base_gaussian, make_base_discrete_gaussian -> make_gaussian #800
    • make_sized_bounded_sum, make_bounded_sum -> make_sum #801
    • make_sized_bounded_mean -> make_mean #806
    • make_sized_bounded_variance -> make_variance #807
    • dp.c.make_user_measurement -> dp.m.make_user_measurement #884
    • dp.c.make_user_transformation -> dp.m.make_user_transformation #884
    • dp.c.make_user_postprocessor -> dp.new_function #884
    • make_base_ptr -> make_base_laplace_threshold #849
      • changed the privacy map to emit fixed (ε, δ) pairs
  • Reordered arguments to make_user_transformation and make_user_measurement
    • input_domain and input_metric now leading to enable then_* variants
  • make_identity is now honest-but-curious in Python, but is general over all choices of domains/metrics #814
  • (Rust-only) sparse histogram APIs have been updated to prepare for Python #756
    • make_base_alp_with_hashers -> make_alp_state_with_hashers
    • make_base_alp -> make_alp_state
    • make_alp_histogram_post_process -> make_alp_queryable
    • thank you Christian Lebeda! (https://github.com/ChristianLebeda)
  • (Rust-only) Transformations and Measurements made read-only #706

Fixed

  • Infinite loop converting from ρ to ε when δ=0 #845

Deprecated

  • All dataframe transformations, in anticipation of a new Polars backend in an upcoming release

0.7.0 - 2023-05-18

Added

  • FFI and Python interfaces for creating and accessing Domains, Metrics, and Measures (#637)
  • Queryables and supporting infrastructure for interactive Measurements (#618), (#675)
  • Constructor for sequential composition of Measurements (#674)
  • Checks for compatibility between pairings of Domains and Metrics/Measures (#604)
  • Python opendp.extrinsics module for code contributions and proofs outside of Rust (#693)
  • Docs: First Look at DP notebook (#666)
  • Docs: Compositors notebook, with usage of interactive Measurements (#735)

Changed

  • Incorporated Domain instances into some constructor signatures (#650)
  • Simplified postprocessors to Function (from previous full Transformation) (#648)
  • Moved some Domain logic from type-inherent constraints to runtime checks of more general types (#645), (#696)
    • Remove SizedDomain in favor of a runtime size descriptor on VectorDomain
    • Remove BoundedDomain in favor of a runtime bounds descriptor on AtomDomain
    • Remove InherentNullDomain in favor of a runtime nullity descriptor on AtomDomain
  • Removed the default Domain limitation on user-defined callbacks, and renamed constructors from make_default_user_XXX() to make_user_XXX (#650)
  • Docs: Improved the clarity of the User Guide based on feedback (#639)
  • Docs: Renamed the Developer Guide to Contributor Guide (#639)

Deprecated

  • AllDomain in the Python bindings, with a warning to switch to AtomDomain (#645)

Removed

  • The output_domain field of Measurement struct (#647)

Fixed

  • Switched to from backtrace crate to std::backtrace, and fixed some corner cases, for much faster backtrace resolution (#691)
  • Whole-codebase reformat using rustfmt to minimize spurious churn in the future (#669)

0.6.2 - 2023-02-06

Added

  • support for user-defined callbacks under explicit opt-in
    • researchers may construct their own transformations, measurements and postprocessors in Python
    • these "custom" components may be interleaved with other components in the library
  • expanded docs.opendp.org User Guide with more explanatory notebooks
  • "contrib" proofs for CKS20 sampler algorithms
  • "contrib" proof for ρ-zCDP to ε(δ)-DP conversion
  • CITATION.cff #552

Fixed

  • cleanup of accuracy utilities #626
    • discrete_gaussian_scale_to_accuracy returns an accuracy one too large when the scale is on the lower edge
    • improve float precision of laplacian_scale_to_accuracy and accuracy_to_laplacian_scale
    • Reported by Alex Whitworth (@alexWhitworth). Thank you!
  • clamp negative epsilon in make_zCDP_to_approxDP when delta is large #621
    • Reported by Marika Swanberg and Shlomi Hod. Thank you!
  • resolve build warnings from metadata in version tags

0.6.1 - 2022-10-27

Fixed

  • docs.rs failed to render due to Katex dependency

0.6.0 - 2022-10-26

Added

  • Restructured and expanded documentation on docs.opendp.org
    • Moved notebooks into the documentation site
    • Updated developer documentation and added introductions to Rust and proof-writing
  • Much more thorough API documentation and links to corresponding Rust documentation
  • Documentation throughout the Rust library, as well as proof definition stubs
  • Additional combinators for converting the privacy measure
    • make_pureDP_to_fixed_approxDP to convert ε to (ε, 0)-approx DP
    • make_pureDP_to_zCDP to convert ε to ρ
  • Additional accuracy functions for discrete noise mechanisms
    • discrete_laplacian_scale_to_accuracy
    • discrete_gaussian_scale_to_accuracy
    • accuracy_to_discrete_laplacian_scale
    • accuracy_to_discrete_gaussian_scale
  • make_b_ary_tree Lipschitz transformation. Use in conjunction with:
    • make_consistent_b_ary_tree to retrieve consistent leaf node counts
    • make_quantiles_from_counts to retrieve quantile estimates
    • make_cdf to estimate a discretized cumulative distribution function
  • make_subset_by, make_df_is_equal and make_df_cast_default transformations
    • used for simple dataframe subsetting
  • make_chain_tm combinator for postprocessing
  • Updates for proof-writing:
    • rust/src/lib.sty contains a collection of latex macros to aid in cross-linking and maintenance
    • See the proof-writing section of the developer documentation
    • PRs with .tex proof documents are rendered by a bot
    • Documentation will now embed links to proof documents that are adjacent to source files
    • Proof documents are automatically hosted and versioned on docs.opendp.org
  • An initial proof for make_count (by @silviacasac, @cwagaman @gracetian6).

Changed

  • Renamed meas to measurements, trans to transformations and comb to combinators
  • Added an honest-but-curious feature flag to make_population_amplification

Fixed

  • Python bindings check that C integers do not overflow
  • Fixed clamping behaviour on make_lipschitz_float_mul
  • Let the type of the sensitivity supplied to make_base_discrete_gaussian vary according to type QI
  • Fix FFI dispatch in fixed approximate DP composition

0.5.0 - 2022-08-23

Added

  • Account for finite data types in aggregators based on our paper CSVW22
  • Stability/privacy relations replaced with maps #463
    • You can now call .map on transformations and measurements to directly get the tightest d_out
  • Composition of measurements #482
    • Permits arbitrary nestings of compositions of an arbitrary number of measurements
  • Discrete noise mechanisms from CKS20
    • make_base_discrete_laplace is equivalent to make_base_geometric, but executes in a constant-time number of operations
    • make_base_discrete_gaussian for the discrete gaussian mechanism
  • Add zero-concentrated differential privacy to the gaussian and discrete gaussian mechanisms
    • Output measure is now always ZeroConcentratedDivergence<Q>, and output distance is in terms of rho
  • Add combinator to cast a measurement's output measure from ZeroConcentratedDivergence<Q> to SmoothedMaxDivergence<Q>
    • meas_smd = opendp.comb.make_zCDP_to_approxDP(meas_zcd)
  • The SmoothedMaxDivergence<Q> measure represents distances as an ε(δ) privacy curve:
    • Can construct a curve by invoking the map: curve = meas_smd.map(d_in)
    • Can evaluate a curve at a given delta epsilon = curve.epsilon(delta)
  • Add make_fix_delta combinator to fix the delta parameter in a SmoothedMaxDivergence<Q> measure
    • The resulting measure is FixedSmoothedMaxDivergence<Q>, where the output distance is an (ε, δ) pair
    • eps, delta = make_fix_delta(meas_smd, delta=1e-8).map(d_in)
    • The fixed measure supports composition (unlike the curve measure)
  • Utility functions set_default_float_type and set_default_int_type to set the default bit depth of ints and floats
  • Exponential search when bounds are not specified in binary search utilities #453
  • Support for Apple silicon (aarch64-apple-darwin target)

Changed

  • Switched to a single Rust crate (merged opendp-ffi into opendp)
  • Updated documentation to reflect feedback from users and added more example notebooks
  • Packaging for Contributor License Agreements
  • Improved formatting of rust stack traces in Python
  • Expanded error-indexes

Deprecated

  • make_base_geometric in favor of the more efficient make_base_discrete_laplace
    • Constant-time execution can still be accessed via make_base_discrete_laplace_linear

Removed

  • make_base_analytic_gaussian in favor of the (now generally tighter) make_base_gaussian
    • This would have been a deprecation, but updating to be consistent with forward maps is nontrivial

Fixed

  • Rust documentation on docs.rs is built with "untrusted" flag enabled
  • Python documentation for historical versions is rebuilt on correct tag
  • Avoid potential infinite loop in binary search utility

Security

  • Replace the underlying implementation of make_base_laplace and make_base_gaussian to address precision-based attacks
    • Both measurements map input floats exactly to an integer discretization, apply discrete laplace or discrete gaussian noise, and then postprocess back to floats
    • The discretization is on ℤ*2^k, where k can be configured, similar to the Google Differential Privacy Library
    • In contrast to the Google library, the approximation to real sampling continues to improve as k is chosen to be smaller than -45. We choose a k of -1074, which matches the subnormal ULP, giving a tight privacy map
  • Fixed function in make_randomized_response_bool
    • from proofwriting by Vicki Xu and Hanwen Zhang #481
  • Multiplicative difference in probabilities in linear-time discrete laplace sampler are now exact around zero
    • eliminates an un-accounted δ < ulp(e^-(1/scale)) from differing conservative roundings
  • Biased bernoulli sampler on float probabilities is now exact
    • eliminates an un-accounted δ < 2^-500 in RR and linear-time discrete laplace sampler
    • from proofwriting by Vicki Xu and Hanwen Zhang #496
  • Added conservative rounding when converting between MFPR floats and native floats
    • MFPR has a different exponent range, which could lead to unintended rounding of floats that are out of exponent range

Migration

  • make_base_gaussian's output measure is now ZeroConcentratedDivergence.
    • This means the output distance is now a single scalar, rho (it used to be an (ε, δ) tuple)
    • Use adp_meas = opendp.comb.make_zCDP_to_approxDP(zcdp_meas) to convert to an ε(δ) curve.
    • Use fadp_meas = opendp.comb.make_fix_delta(adp_meas) to change output distance from an ε(δ) curve to an (ε, δ) tuple
      • fadp_meas.check(d_in, (ε, δ)) is equivalent to the check on make_base_gaussian in 0.4
  • replace make_base_analytic_gaussian with make_base_gaussian
  • replace make_base_geometric with make_base_discrete_laplace
  • make_basic_composition accepts a list of measurements as its first argument (it used to have two arguments)
  • slight increase in sensitivities/privacy utilization across the library as a byproduct of floating-point attack mitigations

0.4.0 - 2021-12-10

Added

  • make_randomized_response_bool and make_randomized_response for local differential privacy.
  • make_base_analytic_gaussian for a tighter, analytic calibration of the gaussian mechanism.
  • make_population_amplification combinator for privacy amplification by subsampling.
  • make_drop_null transformation for dropping null values in nullish data.
  • make_find, make_find_bin and make_index transformations for categorical relabeling and binning.
  • make_base_alp for histograms via approximate laplace projections from Christian Lebeda (https://github.com/ChristianLebeda)
  • make_base_ptr for stability histograms via propose-test-release.
  • Added floating-point numbers to the admissible output types on integer queries like make_count, make_count_by, make_count_by_categories and make_count_distinct.
  • Simple attack notebook from Oren Renard (https://github.com/orespo)
  • Support for Numpy data types.
  • Release helper script

Fixed

  • Resolved memory leaks in FFI

Changed

  • moved windows patch directory into /rust
  • added minimum rust version of 1.56 and updated to the 2021 edition.
  • dropped sized-ness domain requirements from make_count_by

Security

  • make_base_stability underestimated the sensitivity of queries. Removed in favor of make_base_ptr.
  • Floating-point arithmetic throughout the library now has explicit rounding modes such that the budget is always slightly overestimated. There is still some potential for small floating-point leaks via rounding in floating-point aggregations.
  • Fixed integer truncation issue in the sized bounded sum privacy relation.
  • The resize relation is now looser to account for a worst-case situation where d_in records removed, and d_in new records are imputed.

0.3.0 - 2021-09-21

Changed

  • All unvetted modules (which is currently all modules) are tagged with the "contrib" feature
  • Programs must explicitly opt-in to access the "contrib" feature

0.2.4 - 2021-09-20

Fixed

  • Version tag

0.2.3 - 2021-09-20

Fixed

  • Version tag

0.2.2 - 2021-09-20

Added

  • User guide, developer guide, and general focus on documentation
  • Examples folder has complete notebooks for getting started with the library

Fixed

  • Usability issues in the FFI layer for make_count_by_categories and make_count_by
  • The FFI for make_identity ensures proper domain metric pairing

0.2.1 - 2021-09-09

Added

  • Functions to convert between accuracy and noise scale for laplace, gaussian and geometric noise
  • Error messages when chaining include a plaintext description of the mismatched domains or metrics

0.2.0 - 2021-08-31

Added

  • User guide outline
  • Initial exemplar python notebooks
  • Binary search utilities in Python
  • Vec<String> and HashMap<K, V> data loaders
  • Resize transformation for making VectorDomain<D> sized
  • TotalOrd trait for consistency with proofs

Changed

Removed

  • Scalar clamping

Fixed

  • Adjust output domain on make_count_by_categories to make it chainable with measurements

0.1.0 - 2021-08-05

Added

  • Initial release.

Instructions

The format of this file is based on Keep a Changelog. It is processed by scripts when generating a release, so please maintain the existing format.

Whenever you're preparing a significant commit, add a bullet list entry summarizing the change under the X.Y.Z-dev heading at the top. Entries should be grouped in sections based on the kind of change. Please use the following sections, maintaining the same ordering. If the appropriate section isn't present yet, just add it by copying from those below.

Added

Changed

Deprecated

Removed

Fixed

Security

Migration

When a new version is released, a script will turn the Unreleased heading into a new heading with appropriate values for the version, date, and link. Then the script will generate a new Unreleased section for future work. Please keep the existing dummy heading and link as they are, so that things operate correctly. Thanks!