Releases: pola-rs/polars
Python Polars 0.20.3-rc.2
🚀 Performance improvements
- don't needlessly allocate validity in concat/rechunk (#13288)
- add fast path to
count_bits_set_by_offsets
(#13253) - make
.dt.truncate('*mo')
more than 3x faster (#13192)
✨ Enhancements
- change doc links to new url docs.pola.rs (#13290)
- support horizontal concatenation of LazyFrames (#13139)
- Rename
Utf8
data type toString
, keepUtf8
as alias (#13257) - dispatch strict_cast via cast (#13255)
- Impl any/all for array type (#13250)
- add cancellable queries (#13178)
- add
offset
parameter togather_every
(#13156) - Support
Array
dtype AnyValue Series construction (#12817) - Allow
step
parameter inint_ranges
to take an expression (#13148) - make python
map_batches
safer (#13181) - Implement
count
for DataFrame/LazyFrame (#13153)
🐞 Bug fixes
- sorting categorical lexically bugs on null values (#13271)
- improve replace on categoricals (#13223)
- round trip to JSON and back should preserve Enum type (#13267)
- fix return type hint of list series any/all (#13265)
- sink_csv deadlock (#13239)
- Correctly use
read_parquet
for all binary inputs (#13218) is_in
operator for categoricals (#13205)- Better handle mismatched dtypes in
replace
(#13213) - Fix
replace
fast path by castingold
input to the right data type (#13176) - ndjson nested null schema inference (#13206)
- don't cast to unknown dtypes (#13197)
- maintain old join behavior in window expression (#13179)
🛠️ Other improvements
- Copy Makefile build commands to top level (#13293)
- Fix release flags (#13298)
- Re-enable consortium standard tests (#13296)
- Update CODEOWNERS (#13292)
- Add CPU compatibility check (#13134)
- Change base url of docs/guide to
docs.pola.rs
(#13281) - Fix source link for dev docs (#13279)
- fix return type hint of list series any/all (#13265)
- Fix display of overloaded signatures (#13258)
- clean up bytecode parsing a bit (#13221)
- Add a couple of docstring examples to Series methods (#13244)
- remove unnecessary arg unpacking (#13241)
- update rustc (#13219)
- fix horizontal concatenation documentation (#13141)
- Replace blackdoc by ruff's new docstring formatter (#13182)
- Update ruff & ruff settings (#13126)
- Link to latest object_store docs in api doc (#13180)
- Fix failing test (#13171)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Python Polars 0.20.3-rc.1
🚀 Performance improvements
- add fast path to
count_bits_set_by_offsets
(#13253) - make
.dt.truncate('*mo')
more than 3x faster (#13192)
✨ Enhancements
- Rename
Utf8
data type toString
, keepUtf8
as alias (#13257) - dispatch strict_cast via cast (#13255)
- Impl any/all for array type (#13250)
- add cancellable queries (#13178)
- add
offset
parameter togather_every
(#13156) - Support
Array
dtype AnyValue Series construction (#12817) - Allow
step
parameter inint_ranges
to take an expression (#13148) - make python
map_batches
safer (#13181) - Implement
count
for DataFrame/LazyFrame (#13153)
🐞 Bug fixes
- sorting categorical lexically bugs on null values (#13271)
- improve replace on categoricals (#13223)
- round trip to JSON and back should preserve Enum type (#13267)
- fix return type hint of list series any/all (#13265)
- sink_csv deadlock (#13239)
- Correctly use
read_parquet
for all binary inputs (#13218) is_in
operator for categoricals (#13205)- Better handle mismatched dtypes in
replace
(#13213) - Fix
replace
fast path by castingold
input to the right data type (#13176) - ndjson nested null schema inference (#13206)
- don't cast to unknown dtypes (#13197)
- maintain old join behavior in window expression (#13179)
🛠️ Other improvements
- Add CPU compatibility check (#13134)
- Change base url of docs/guide to
docs.pola.rs
(#13281) - Fix source link for dev docs (#13279)
- fix return type hint of list series any/all (#13265)
- Fix display of overloaded signatures (#13258)
- clean up bytecode parsing a bit (#13221)
- Add a couple of docstring examples to Series methods (#13244)
- remove unnecessary arg unpacking (#13241)
- update rustc (#13219)
- fix horizontal concatenation documentation (#13141)
- Replace blackdoc by ruff's new docstring formatter (#13182)
- Update ruff & ruff settings (#13126)
- Link to latest object_store docs in api doc (#13180)
- Fix failing test (#13171)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Python Polars 0.20.2
🚀 Performance improvements
- ensure single expression evaluation for replace (#13147)
- drop the pyarrow conversion path in
iter_rows
; we can now do fully native conversion ~2-3x faster (#13122)
✨ Enhancements
- Move from GA to more privacy friendly framework (#13155)
- prune all/any_horizontals with single inputs (#13146)
- ensure we get cleaner logical plans with
any/all_horizontal
(#13144)
🐞 Bug fixes
- Fix comparison of categoricals (#13137)
- Use the name of the leftmost expression in horizontal operations (#13143)
- any_value should supports cast to boolean (#13125)
- Update offsets of null value correctly for all
from_iter_xxx_trusted_len
(#13132) - fix neq for series cmp str (#13128)
- Fix off-by-one error in
lit
dtype determination for integers (#13129) - fix category list builder append series with multiple chunks (#13116)
🛠️ Other improvements
- Fix release LTS CPU step (#13160)
- Use the name of the leftmost expression in horizontal operations (#13143)
- ensure we get cleaner logical plans with
any/all_horizontal
(#13144) - Minor cleanup of PyO3 bindings (#13067)
- Update
auto_explode
param name toreturns_scalar
(#13119) - Mark whether the current package is the LTS-CPU version (#13068)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @orlp, @reswqa, @ritchie46 and @stinodego
Python Polars 0.20.1
🐞 Bug fixes
- repeat_by should not raise if by contains nulls (#13105)
- [csv] raise on single quote char (#13104)
- Raise if scan zstd compressed csv file (#13102)
- allow timeunit-less dtype in
pl.lit
creation (#12997) - Don't check map length if input is literal (#13098)
- rolling_quantile can get incorrect state (#13088)
🛠️ Other improvements
- Fix column name in
contains_any
example (#13090) - update user-defined-functions for 0.19.x (#13071)
- Fix some links, and make
map_batches
warning more evident (#13081) - Linting updates (#13069)
- take pl.concat out of StringCache context manager in "mismatched string cache" error message (#13076)
- add Enum to dtype list (#13080)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego
Python Polars 0.20.0
This version includes quite a few breaking changes. We are preparing for the 1.0
release and aim to make the upgrade from 0.20
to 1.0
as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0
.
Check out the upgrade guide for help navigating the upgrade to this version.
Please bear with us while we continue to make Polars the best tool it can be!
🏆 Highlights
- Add new
Enum
categorical data type which allows a fixed set of categories (#11822)
💥 Breaking changes
- Use Object Store instead of fsspec for
read_parquet
(#13044) - Reimplement
replace
expression on the Rust side (#13002) - Preserve left and right join keys in outer joins (#12963)
- Update
update
signature (#12986) - Update
Expr.count
to ignore null values by default (#12934) - Scheduled removal of previously deprecated functionality (#12885)
- Allow all
DataType
objects to be instantiated (#12470) - Change
value_counts
resulting column name fromcounts
tocount
(#12506) - Change default
join
behavior with regard to nulls, addjoin_nulls
parameter to keep existing behavior (#12840) - Default to exact checking for integers in assertion utils (#12331)
- Set default dtype for Series to
Null
when no data is present (#12807) - Update
lit
behavior for list/tuple inputs (#12559) - Change
DataType.is_nested
from property to classmethod (#12453) - Update constructors for Array and Decimal (#12837)
- Smaller integer data types for datetime components (#12070)
- Fix
NaN
ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)
⚠️ Deprecations
- Rename
write_database
parameterif_exists
toif_table_exists
(#12783)
🚀 Performance improvements
- Avoid dispatching to expression engine for various
Series
methods (#13010) - Elide allocation in outer join materialization (#12992)
- Avoid dispatching
Series.head/tail
to the expression engine (#12946) - Ensure we reduce for
any/all_horizontal
(#12976) - Add fast paths for UTC in
truncate
(#12965) - Use
select_seq
for expression dispatch (#12962) - Improve
rolling_median
algorithm (#12704) - Use fast path for non-null data in new SQL-like null matching (#12874)
- Optimize
DataFrame.iter_rows
for smaller buffer sizes (#12804) - Speed up initializing
Series
from a list of NumPy arrays (#12785)
✨ Enhancements
- Add
str.contains_any
andstr.replace_many
(Aho-Corasick algorithms) (#13073) - Auto-infer credentials from
.aws
folder (#13062) - Support private cloud S3 storage in
scan_parquet
(#13060) - Use Object Store instead of fsspec for
read_parquet
(#13044) - Avoid dispatching to expression engine for various
Series
methods (#13010) - Allow order operators (<,>,>=,<=) on Enum types (#12982)
- Reimplement
replace
expression on the Rust side (#13002) - Expand set of NumPy functions which emit
inefficient map_*
warning (#13039) - Use tokio semaphore for concurrency handling (#13026)
- Improve and expressify
hist
(#13014) - Update
describe
to use newcount
implementation (#12990) - Add default
to_struct
Series name consistent with the usual default Series name (empty string) (#12998) - Preserve left and right join keys in outer joins (#12963)
- Clarify "inefficient
map_elements
" warning message (#12978) - Allow
end
beforestart
indate/time_range
(#12964) - Update
update
signature (#12986) - Minor update to
Array
data type repr (#12973) - Implement group-tuples for
Null
dtype (#12975) - Cast to an enum from int (#12954)
- Move categorical ordering into dtype (#12911)
- Avoid importing interchange module by default (#12927)
- Update
Expr.count
to ignore null values by default (#12934) - Raise if expression passed as scalar to DataFrame constructor (#12916)
- Update
repr
ofStruct
data type class (#12922) - Enable partial predicate pushdown past window expressions (#12710)
- Add
merge
mode towrite_delta
and remove pyarrow to delta conversions (#12392) - Add
str.reverse
(#12878) - Allow all
DataType
objects to be instantiated (#12470) - Specific performance warnings from Rust to Python (#12802)
- Change
value_counts
resulting column name fromcounts
tocount
(#12506) - Implement
std
andvar
forDuration
columns (#12865) - Change default
join
behavior with regard to nulls, addjoin_nulls
parameter to keep existing behavior (#12840) - Enhance
write_database
return (indicate the number of rows affected by the operation) (#12830) - Add dedicated
Decimal
selector (#12852) - Preserve base dtype when raising to
UInt
power (#10446) - Default to exact checking for integers in assertion utils (#12331)
- Improve
__repr__
implementation forExpr
(#12770) - Support SQL subqueries for
JOIN
andFROM
(#12819)
🐞 Bug fixes
- Fix off-by-one error in
quantile(method="nearest")
(#13058) - Fix incorrect schema inference on nested columns (#13057)
- Don't raise for
datetime_range
if starting on ambiguous datetime and earliest was specified (#13050) - Parse
json_decode
per max buffer length (#13029) - Parse
00:00
time zone as UTC (#13034) - Fix timeout errors in concurrent downloads (#13023)
- Streamline
align_frames
and fix edge-case where the identical frame object appears more than once (#13007) - Fix SQL substring indexing (#13016)
- Allow broadcasting in
ranges
(#11900) - Prevent deadlock in
sink_csv
(#12991) - Don't get mutable if buffer is sliced (#12979)
- Support parameterized
read_database
calls against cursors that only take positional args (#12967) - Fix
truncate
when truncating by multiple weeks (#12948) - Fix segfault / memory corruption after plugins return
Err
result (#12953) - Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
- Don't panic when
ambiguous
parameter is not Utf8 (#12913) - Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
- Patch
rolling_var
/rolling_std
numerical stability (#12909) - Fix incorrect Int16
min
/max
due to incorrect SIMD mask construction (#12908) - Improve handling of decimal conversion with
to_numpy
in the absence of pyarrow (#12888) - Fix OOB error in list set operations on empty frame (#12845)
- Fix error message for uninstantiated
Enum
types (#12886) - Fix repr of
Expr.gather
(which was still showing deprecated take) (#12864) - Fix
Array
dtype equality (#12853) - Fix
nan_min/max
incorrectly aggregating chunks with addition (#12848) - Revert type hint change on expression inputs (#12792)
- More accurate type hinting for
collect_all
functions (#12796) - Use total float ordering in is_in (#12800)
- Handle aggregation for all-NaN groups in
group_by
(#12304)
🛠️ Other improvements
- Update version switcher for
0.20
(#12844) - Add upgrade guide for Python Polars 0.20 (#12872)
- Run doctests before other tests (#13047)
- Update
describe
calculation of min/max (#13027) - Minor typo fix (#13003)
- Resolve two interchange tests failing locally (#12999)
- Update outdated links to API in Expressions/Functions page (#12981)
- Expand docstrings for
count
(#12960) - Fix issue with docs for
group_by_dynamic
(#12906) - Prefer explicit
--no-cov
flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889) - Scheduled removal of previously deprecated functionality (#12885)
- Fix references in deprecation notes (#12877)
- Fix typo in
hash
docstring (#12879) - Fix docstring for deprecated
list.take
(#12873) - Note that
list.take
is deprecated (#12867) - Fix failing tests (#12859)
- Add quotes to
pip install
with dependencies (#12799) - Fix parameter name reference in
update
docstring #12797
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange
Python Polars 0.19.19
✨ Enhancements
- Parquet support required deltabyte encoding (#12836)
🐞 Bug fixes
- Fix incorrect values from parquet RLE decoding (#12818)
- Write only one dict page per row rowgroup (#12831)
Thank you to all our contributors for making this release possible!
@nameexhaustion, @ritchie46 and @stinodego
Python Polars 0.19.18
✨ Enhancements
- support nested null in vstack/append/extend/concat (#12771)
- Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (#12421)
- determine mode parallelism depending on current tasks (#12764)
- enable slice push down past
with_columns
(#12742) - Improve
write_database
, accounting for latestadbc
fixes/updates (#12713)
🐞 Bug fixes
- don't use streaming engine if aggregate is unknown (#12769)
- Enable special casing of sequence in list_to_struct (#12759)
- hold align_chunks_invariant (#12738)
- allow leading zero and plus in integer parsing (#12744)
- csv lines iter, always return remainder (#12739)
- fix oob in set operations (#12736)
- undo regression in ability to read certain parquet files (#12731)
🛠️ Other improvements
- Use latest
atoi_simd
release (#12748) - Fix invalid references to
xlsx2csv
dependency (#12741) - Remove pinned
aiohttp
dependency (#12733)
Thank you to all our contributors for making this release possible!
@0siride, @PierreAttard, @RoDmitry, @alexander-beedie, @dependabot, @dependabot[bot], @eitsupi, @kszlim, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 0.19.17
✨ Enhancements
- Automatically wrap NumPy array as lit (#12709)
- Add
DataFrame.iter_columns
(#12653) - favour showing "adbc_driver_manager" over "adbc_driver_sqlite" in
show_versions
(#12690)
🐞 Bug fixes
- corr return nan if denominator is invalid (#12708)
- parquet decimal statistics and schema (#12705)
- support
append
/extend
with null series (#11824) (#12686) - address a numpy ndarray init regression (#12701)
- fix carrying over infinity into other windows (#12685)
🛠️ Other improvements
- Update URI prefix in examples (prefer "postgresql" to "postgres") (#12707)
- now that
scan_parquet
supports hive partitioning, remove note pointing toscan_pyarrow_dataset
(#12706) - Minor docstring fixes (#12688)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @ritchie46, @stinodego and @tkarabela
Python Polars 0.19.16
⚠️ Deprecations
- Rename
series_equal
/frame_equal
toequals
(#12618) - Rename
map_dict
toreplace
and change default behavior (#12599)
🚀 Performance improvements
- order(s) of magnitude speedup when initialising
List
dtypeSeries
from 2D numpy array (#12672) - improve
merge_local_rhs_categorical
traversal (#12660) - make values_size estimate correct for sliced arrays (#12658)
- improve parquet utf8 validation (#12655)
- parquet pre-allocate buffer in binary plain encode (#12652)
- optimize dict binary decoding in parquet (#12648)
- ensure we only check the values within bounds (#12633)
- parquet; elide recursion in hot path (#12625)
- improve cov/corr algorithm (#12590)
✨ Enhancements
- Join operations on local categoricals (#12657)
- Implement
PySeries.from_buffer
for boolean buffers (#12654) - Implement
PySeries.from_buffer
for numeric types (#12646) - use RLE_DICTIONARY for integers in parquet (#12647)
- extend recent
filter
syntax upgrades towhen/then
construct (#12603) - implement RLE_DICT encoding for utf8/binary columns (reduced parquet file size) (#12623)
- implement 'DeltaByteArray' decoding for parquet (#12602)
🐞 Bug fixes
- json null inference (#12677)
- cov/corr respect f32 type (#12676)
- fix ternary zip_with null broadcast (#12668)
- support negative slice on eager frame (#12644)
- fix concurrency budget assertion (#12641)
- fix oob in set operations (#12640)
- panic reading parquet nested struct column (#12614)
- Fix deprecation message for
DataFrame.sum
(#12619) - features:
performant,lazy,random
(#12600)
🛠️ Other improvements
- Use
range
instead ofnp.arange
in constructors (#12621) - update custom allocator instructions to include macOS (#12593)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @cardoso, @dmitrybugakov, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 0.19.15
⚠️ Deprecations
- Rename
str.json_extract
tostr.json_decode
(#12586)
🚀 Performance improvements
- apply left side predicate pushdown also to right side on semi join (#12565)
- ensure streaming parquet download remains concurrent
~7x
(#12552)
✨ Enhancements
- warn if
by
column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398) - struct -> json encoding expression (#12583)
- Implement support for multi-character comments in
read_csv
(#12519) - Implement
LazyFrame.sink_ndjson
(#10786) - use JEMALLOC on all unix architectures (#12568)
- improve concurrency parameters (#12567)
- In explain(), rename PIPELINE to STREAMING so it's clearer what it means (#12547)
🐞 Bug fixes
- error when invalid list to array is given (#12584)
- parquet: do not extend existing nested that is already complete (#12569)
- accidental panic if predicate selects no files (#12575)
- fix lazy parquet slice with nested columns (#12558)
- ensure stats-evalutor exists (#12566)
- list schema of list
eval
(#12563) - ensure concurrency budget never locks (#12555)
- Fix lazy schema for
group_by_dynamic
androlling
(#12551) - address overflow on vec capacity calculation for
int_ranges
with negative step (#12548)
🛠️ Other improvements
- convert all recursive parquet deserialize to iterative (#12560)
- Minor cleanup in Expr class (#12549)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Qqwy, @alexander-beedie, @dmitrybugakov, @fernandocast, @gab23r, @itamarst, @nameexhaustion, @ritchie46, @stinodego and @uchiiii