Skip to content

Releases: pola-rs/polars

Python Polars 0.20.3-rc.2

28 Dec 16:47
2f1037a
Compare
Choose a tag to compare
Pre-release

🚀 Performance improvements

  • don't needlessly allocate validity in concat/rechunk (#13288)
  • add fast path to count_bits_set_by_offsets (#13253)
  • make .dt.truncate('*mo') more than 3x faster (#13192)

✨ Enhancements

  • change doc links to new url docs.pola.rs (#13290)
  • support horizontal concatenation of LazyFrames (#13139)
  • Rename Utf8 data type to String, keep Utf8 as alias (#13257)
  • dispatch strict_cast via cast (#13255)
  • Impl any/all for array type (#13250)
  • add cancellable queries (#13178)
  • add offset parameter to gather_every (#13156)
  • Support Array dtype AnyValue Series construction (#12817)
  • Allow step parameter in int_ranges to take an expression (#13148)
  • make python map_batches safer (#13181)
  • Implement count for DataFrame/LazyFrame (#13153)

🐞 Bug fixes

  • sorting categorical lexically bugs on null values (#13271)
  • improve replace on categoricals (#13223)
  • round trip to JSON and back should preserve Enum type (#13267)
  • fix return type hint of list series any/all (#13265)
  • sink_csv deadlock (#13239)
  • Correctly use read_parquet for all binary inputs (#13218)
  • is_in operator for categoricals (#13205)
  • Better handle mismatched dtypes in replace (#13213)
  • Fix replace fast path by casting old input to the right data type (#13176)
  • ndjson nested null schema inference (#13206)
  • don't cast to unknown dtypes (#13197)
  • maintain old join behavior in window expression (#13179)

🛠️ Other improvements

  • Copy Makefile build commands to top level (#13293)
  • Fix release flags (#13298)
  • Re-enable consortium standard tests (#13296)
  • Update CODEOWNERS (#13292)
  • Add CPU compatibility check (#13134)
  • Change base url of docs/guide to docs.pola.rs (#13281)
  • Fix source link for dev docs (#13279)
  • fix return type hint of list series any/all (#13265)
  • Fix display of overloaded signatures (#13258)
  • clean up bytecode parsing a bit (#13221)
  • Add a couple of docstring examples to Series methods (#13244)
  • remove unnecessary arg unpacking (#13241)
  • update rustc (#13219)
  • fix horizontal concatenation documentation (#13141)
  • Replace blackdoc by ruff's new docstring formatter (#13182)
  • Update ruff & ruff settings (#13126)
  • Link to latest object_store docs in api doc (#13180)
  • Fix failing test (#13171)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego

Python Polars 0.20.3-rc.1

28 Dec 13:13
9b1c550
Compare
Choose a tag to compare
Pre-release

🚀 Performance improvements

  • add fast path to count_bits_set_by_offsets (#13253)
  • make .dt.truncate('*mo') more than 3x faster (#13192)

✨ Enhancements

  • Rename Utf8 data type to String, keep Utf8 as alias (#13257)
  • dispatch strict_cast via cast (#13255)
  • Impl any/all for array type (#13250)
  • add cancellable queries (#13178)
  • add offset parameter to gather_every (#13156)
  • Support Array dtype AnyValue Series construction (#12817)
  • Allow step parameter in int_ranges to take an expression (#13148)
  • make python map_batches safer (#13181)
  • Implement count for DataFrame/LazyFrame (#13153)

🐞 Bug fixes

  • sorting categorical lexically bugs on null values (#13271)
  • improve replace on categoricals (#13223)
  • round trip to JSON and back should preserve Enum type (#13267)
  • fix return type hint of list series any/all (#13265)
  • sink_csv deadlock (#13239)
  • Correctly use read_parquet for all binary inputs (#13218)
  • is_in operator for categoricals (#13205)
  • Better handle mismatched dtypes in replace (#13213)
  • Fix replace fast path by casting old input to the right data type (#13176)
  • ndjson nested null schema inference (#13206)
  • don't cast to unknown dtypes (#13197)
  • maintain old join behavior in window expression (#13179)

🛠️ Other improvements

  • Add CPU compatibility check (#13134)
  • Change base url of docs/guide to docs.pola.rs (#13281)
  • Fix source link for dev docs (#13279)
  • fix return type hint of list series any/all (#13265)
  • Fix display of overloaded signatures (#13258)
  • clean up bytecode parsing a bit (#13221)
  • Add a couple of docstring examples to Series methods (#13244)
  • remove unnecessary arg unpacking (#13241)
  • update rustc (#13219)
  • fix horizontal concatenation documentation (#13141)
  • Replace blackdoc by ruff's new docstring formatter (#13182)
  • Update ruff & ruff settings (#13126)
  • Link to latest object_store docs in api doc (#13180)
  • Fix failing test (#13171)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego

Python Polars 0.20.2

20 Dec 19:18
40d3e08
Compare
Choose a tag to compare

🚀 Performance improvements

  • ensure single expression evaluation for replace (#13147)
  • drop the pyarrow conversion path in iter_rows; we can now do fully native conversion ~2-3x faster (#13122)

✨ Enhancements

  • Move from GA to more privacy friendly framework (#13155)
  • prune all/any_horizontals with single inputs (#13146)
  • ensure we get cleaner logical plans with any/all_horizontal (#13144)

🐞 Bug fixes

  • Fix comparison of categoricals (#13137)
  • Use the name of the leftmost expression in horizontal operations (#13143)
  • any_value should supports cast to boolean (#13125)
  • Update offsets of null value correctly for all from_iter_xxx_trusted_len (#13132)
  • fix neq for series cmp str (#13128)
  • Fix off-by-one error in lit dtype determination for integers (#13129)
  • fix category list builder append series with multiple chunks (#13116)

🛠️ Other improvements

  • Fix release LTS CPU step (#13160)
  • Use the name of the leftmost expression in horizontal operations (#13143)
  • ensure we get cleaner logical plans with any/all_horizontal (#13144)
  • Minor cleanup of PyO3 bindings (#13067)
  • Update auto_explode param name to returns_scalar (#13119)
  • Mark whether the current package is the LTS-CPU version (#13068)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @orlp, @reswqa, @ritchie46 and @stinodego

Python Polars 0.20.1

18 Dec 16:58
2f676fb
Compare
Choose a tag to compare

🐞 Bug fixes

  • repeat_by should not raise if by contains nulls (#13105)
  • [csv] raise on single quote char (#13104)
  • Raise if scan zstd compressed csv file (#13102)
  • allow timeunit-less dtype in pl.lit creation (#12997)
  • Don't check map length if input is literal (#13098)
  • rolling_quantile can get incorrect state (#13088)

🛠️ Other improvements

  • Fix column name in contains_any example (#13090)
  • update user-defined-functions for 0.19.x (#13071)
  • Fix some links, and make map_batches warning more evident (#13081)
  • Linting updates (#13069)
  • take pl.concat out of StringCache context manager in "mismatched string cache" error message (#13076)
  • add Enum to dtype list (#13080)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego

Python Polars 0.20.0

16 Dec 15:31
f96d2cd
Compare
Choose a tag to compare

This version includes quite a few breaking changes. We are preparing for the 1.0 release and aim to make the upgrade from 0.20 to 1.0 as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0.

Check out the upgrade guide for help navigating the upgrade to this version.

Please bear with us while we continue to make Polars the best tool it can be!

🏆 Highlights

  • Add new Enum categorical data type which allows a fixed set of categories (#11822)

💥 Breaking changes

  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Reimplement replace expression on the Rust side (#13002)
  • Preserve left and right join keys in outer joins (#12963)
  • Update update signature (#12986)
  • Update Expr.count to ignore null values by default (#12934)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Allow all DataType objects to be instantiated (#12470)
  • Change value_counts resulting column name from counts to count (#12506)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Default to exact checking for integers in assertion utils (#12331)
  • Set default dtype for Series to Null when no data is present (#12807)
  • Update lit behavior for list/tuple inputs (#12559)
  • Change DataType.is_nested from property to classmethod (#12453)
  • Update constructors for Array and Decimal (#12837)
  • Smaller integer data types for datetime components (#12070)
  • Fix NaN ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)

⚠️ Deprecations

  • Rename write_database parameter if_exists to if_table_exists (#12783)

🚀 Performance improvements

  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Elide allocation in outer join materialization (#12992)
  • Avoid dispatching Series.head/tail to the expression engine (#12946)
  • Ensure we reduce for any/all_horizontal (#12976)
  • Add fast paths for UTC in truncate (#12965)
  • Use select_seq for expression dispatch (#12962)
  • Improve rolling_median algorithm (#12704)
  • Use fast path for non-null data in new SQL-like null matching (#12874)
  • Optimize DataFrame.iter_rows for smaller buffer sizes (#12804)
  • Speed up initializing Series from a list of NumPy arrays (#12785)

✨ Enhancements

  • Add str.contains_any and str.replace_many (Aho-Corasick algorithms) (#13073)
  • Auto-infer credentials from .aws folder (#13062)
  • Support private cloud S3 storage in scan_parquet (#13060)
  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Allow order operators (<,>,>=,<=) on Enum types (#12982)
  • Reimplement replace expression on the Rust side (#13002)
  • Expand set of NumPy functions which emit inefficient map_* warning (#13039)
  • Use tokio semaphore for concurrency handling (#13026)
  • Improve and expressify hist (#13014)
  • Update describe to use new count implementation (#12990)
  • Add default to_struct Series name consistent with the usual default Series name (empty string) (#12998)
  • Preserve left and right join keys in outer joins (#12963)
  • Clarify "inefficient map_elements" warning message (#12978)
  • Allow end before start in date/time_range (#12964)
  • Update update signature (#12986)
  • Minor update to Array data type repr (#12973)
  • Implement group-tuples for Null dtype (#12975)
  • Cast to an enum from int (#12954)
  • Move categorical ordering into dtype (#12911)
  • Avoid importing interchange module by default (#12927)
  • Update Expr.count to ignore null values by default (#12934)
  • Raise if expression passed as scalar to DataFrame constructor (#12916)
  • Update repr of Struct data type class (#12922)
  • Enable partial predicate pushdown past window expressions (#12710)
  • Add merge mode to write_delta and remove pyarrow to delta conversions (#12392)
  • Add str.reverse (#12878)
  • Allow all DataType objects to be instantiated (#12470)
  • Specific performance warnings from Rust to Python (#12802)
  • Change value_counts resulting column name from counts to count (#12506)
  • Implement std and var for Duration columns (#12865)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Enhance write_database return (indicate the number of rows affected by the operation) (#12830)
  • Add dedicated Decimal selector (#12852)
  • Preserve base dtype when raising to UInt power (#10446)
  • Default to exact checking for integers in assertion utils (#12331)
  • Improve __repr__ implementation for Expr (#12770)
  • Support SQL subqueries for JOIN and FROM (#12819)

🐞 Bug fixes

  • Fix off-by-one error in quantile(method="nearest") (#13058)
  • Fix incorrect schema inference on nested columns (#13057)
  • Don't raise for datetime_range if starting on ambiguous datetime and earliest was specified (#13050)
  • Parse json_decode per max buffer length (#13029)
  • Parse 00:00 time zone as UTC (#13034)
  • Fix timeout errors in concurrent downloads (#13023)
  • Streamline align_frames and fix edge-case where the identical frame object appears more than once (#13007)
  • Fix SQL substring indexing (#13016)
  • Allow broadcasting in ranges (#11900)
  • Prevent deadlock in sink_csv (#12991)
  • Don't get mutable if buffer is sliced (#12979)
  • Support parameterized read_database calls against cursors that only take positional args (#12967)
  • Fix truncate when truncating by multiple weeks (#12948)
  • Fix segfault / memory corruption after plugins return Err result (#12953)
  • Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
  • Don't panic when ambiguous parameter is not Utf8 (#12913)
  • Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
  • Patch rolling_var/rolling_std numerical stability (#12909)
  • Fix incorrect Int16 min/max due to incorrect SIMD mask construction (#12908)
  • Improve handling of decimal conversion with to_numpy in the absence of pyarrow (#12888)
  • Fix OOB error in list set operations on empty frame (#12845)
  • Fix error message for uninstantiated Enum types (#12886)
  • Fix repr of Expr.gather (which was still showing deprecated take) (#12864)
  • Fix Array dtype equality (#12853)
  • Fix nan_min/max incorrectly aggregating chunks with addition (#12848)
  • Revert type hint change on expression inputs (#12792)
  • More accurate type hinting for collect_all functions (#12796)
  • Use total float ordering in is_in (#12800)
  • Handle aggregation for all-NaN groups in group_by (#12304)

🛠️ Other improvements

  • Update version switcher for 0.20 (#12844)
  • Add upgrade guide for Python Polars 0.20 (#12872)
  • Run doctests before other tests (#13047)
  • Update describe calculation of min/max (#13027)
  • Minor typo fix (#13003)
  • Resolve two interchange tests failing locally (#12999)
  • Update outdated links to API in Expressions/Functions page (#12981)
  • Expand docstrings for count (#12960)
  • Fix issue with docs for group_by_dynamic (#12906)
  • Prefer explicit --no-cov flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Fix references in deprecation notes (#12877)
  • Fix typo in hash docstring (#12879)
  • Fix docstring for deprecated list.take (#12873)
  • Note that list.take is deprecated (#12867)
  • Fix failing tests (#12859)
  • Add quotes to pip install with dependencies (#12799)
  • Fix parameter name reference in update docstring #12797

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange

Python Polars 0.19.19

01 Dec 19:21
Compare
Choose a tag to compare

✨ Enhancements

  • Parquet support required deltabyte encoding (#12836)

🐞 Bug fixes

  • Fix incorrect values from parquet RLE decoding (#12818)
  • Write only one dict page per row rowgroup (#12831)

Thank you to all our contributors for making this release possible!
@nameexhaustion, @ritchie46 and @stinodego

Python Polars 0.19.18

29 Nov 17:39
d3ecfe1
Compare
Choose a tag to compare

✨ Enhancements

  • support nested null in vstack/append/extend/concat (#12771)
  • Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (#12421)
  • determine mode parallelism depending on current tasks (#12764)
  • enable slice push down past with_columns (#12742)
  • Improve write_database, accounting for latest adbc fixes/updates (#12713)

🐞 Bug fixes

  • don't use streaming engine if aggregate is unknown (#12769)
  • Enable special casing of sequence in list_to_struct (#12759)
  • hold align_chunks_invariant (#12738)
  • allow leading zero and plus in integer parsing (#12744)
  • csv lines iter, always return remainder (#12739)
  • fix oob in set operations (#12736)
  • undo regression in ability to read certain parquet files (#12731)

🛠️ Other improvements

  • Use latest atoi_simd release (#12748)
  • Fix invalid references to xlsx2csv dependency (#12741)
  • Remove pinned aiohttp dependency (#12733)

Thank you to all our contributors for making this release possible!
@0siride, @PierreAttard, @RoDmitry, @alexander-beedie, @dependabot, @dependabot[bot], @eitsupi, @kszlim, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Python Polars 0.19.17

27 Nov 13:24
38d016b
Compare
Choose a tag to compare

✨ Enhancements

  • Automatically wrap NumPy array as lit (#12709)
  • Add DataFrame.iter_columns (#12653)
  • favour showing "adbc_driver_manager" over "adbc_driver_sqlite" in show_versions (#12690)

🐞 Bug fixes

  • corr return nan if denominator is invalid (#12708)
  • parquet decimal statistics and schema (#12705)
  • support append/extend with null series (#11824) (#12686)
  • address a numpy ndarray init regression (#12701)
  • fix carrying over infinity into other windows (#12685)

🛠️ Other improvements

  • Update URI prefix in examples (prefer "postgresql" to "postgres") (#12707)
  • now that scan_parquet supports hive partitioning, remove note pointing to scan_pyarrow_dataset (#12706)
  • Minor docstring fixes (#12688)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @ritchie46, @stinodego and @tkarabela

Python Polars 0.19.16

25 Nov 12:39
de2a5ef
Compare
Choose a tag to compare

⚠️ Deprecations

  • Rename series_equal/frame_equal to equals (#12618)
  • Rename map_dict to replace and change default behavior (#12599)

🚀 Performance improvements

  • order(s) of magnitude speedup when initialising List dtype Series from 2D numpy array (#12672)
  • improve merge_local_rhs_categorical traversal (#12660)
  • make values_size estimate correct for sliced arrays (#12658)
  • improve parquet utf8 validation (#12655)
  • parquet pre-allocate buffer in binary plain encode (#12652)
  • optimize dict binary decoding in parquet (#12648)
  • ensure we only check the values within bounds (#12633)
  • parquet; elide recursion in hot path (#12625)
  • improve cov/corr algorithm (#12590)

✨ Enhancements

  • Join operations on local categoricals (#12657)
  • Implement PySeries.from_buffer for boolean buffers (#12654)
  • Implement PySeries.from_buffer for numeric types (#12646)
  • use RLE_DICTIONARY for integers in parquet (#12647)
  • extend recent filter syntax upgrades to when/then construct (#12603)
  • implement RLE_DICT encoding for utf8/binary columns (reduced parquet file size) (#12623)
  • implement 'DeltaByteArray' decoding for parquet (#12602)

🐞 Bug fixes

  • json null inference (#12677)
  • cov/corr respect f32 type (#12676)
  • fix ternary zip_with null broadcast (#12668)
  • support negative slice on eager frame (#12644)
  • fix concurrency budget assertion (#12641)
  • fix oob in set operations (#12640)
  • panic reading parquet nested struct column (#12614)
  • Fix deprecation message for DataFrame.sum (#12619)
  • features: performant,lazy,random (#12600)

🛠️ Other improvements

  • Use range instead of np.arange in constructors (#12621)
  • update custom allocator instructions to include macOS (#12593)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @cardoso, @dmitrybugakov, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Python Polars 0.19.15

20 Nov 14:33
2adc669
Compare
Choose a tag to compare

⚠️ Deprecations

  • Rename str.json_extract to str.json_decode (#12586)

🚀 Performance improvements

  • apply left side predicate pushdown also to right side on semi join (#12565)
  • ensure streaming parquet download remains concurrent ~7x (#12552)

✨ Enhancements

  • warn if by column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398)
  • struct -> json encoding expression (#12583)
  • Implement support for multi-character comments in read_csv (#12519)
  • Implement LazyFrame.sink_ndjson (#10786)
  • use JEMALLOC on all unix architectures (#12568)
  • improve concurrency parameters (#12567)
  • In explain(), rename PIPELINE to STREAMING so it's clearer what it means (#12547)

🐞 Bug fixes

  • error when invalid list to array is given (#12584)
  • parquet: do not extend existing nested that is already complete (#12569)
  • accidental panic if predicate selects no files (#12575)
  • fix lazy parquet slice with nested columns (#12558)
  • ensure stats-evalutor exists (#12566)
  • list schema of list eval (#12563)
  • ensure concurrency budget never locks (#12555)
  • Fix lazy schema for group_by_dynamic and rolling (#12551)
  • address overflow on vec capacity calculation for int_ranges with negative step (#12548)

🛠️ Other improvements

  • convert all recursive parquet deserialize to iterative (#12560)
  • Minor cleanup in Expr class (#12549)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Qqwy, @alexander-beedie, @dmitrybugakov, @fernandocast, @gab23r, @itamarst, @nameexhaustion, @ritchie46, @stinodego and @uchiiii