Release Rust Polars 0.52.0 · pola-rs/polars

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
Skip filtering scan IR if no paths were filtered (#25037)
Optimize ipc stream read performance (#24671)
Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
Lower unique to native group-by and speed up n_unique in group-by context (#24976)
Better parallelize take{_slice,}_unchecked (#24980)
Implement native skew and kurtosis in group-by context (#24961)
Use native group-by aggregations for bitwise_* operations (#24935)
Address group_by_dynamic slowness in sparse data (#24916)
Native filter/drop_nulls/drop_nans in group-by context (#24897)
Implement cumulative_eval using the group-by engine (#24889)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Implement native null_count, any and all group-by aggregations (#24859)
Speed up reverse in group-by context (#24855)
Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
Don't check duplicates on streaming simple projection in release mode (#24830)
Lower approx_n_unique to the streaming engine (#24821)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
Implement indexed method for BitMapIter::nth (#24766)
Pushdown slices on plans within unions (#24735)
Optimize gather_every(n=1) to slice (#24704)
Lower null count to streaming engine (#24703)
Native streaming gather_every (#24700)
Pushdown filter with strptime if input is literal (#24694)
Avoid copying expanded paths (#24669)
Relax filter expr ordering (#24662)
Remove unnecessary groups call in aggregated (#24651)
Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)

✨ Enhancements

Improve error message on unsupported SQL subquery comparisons (#25135)
Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
Support ewm_var/std in streaming engine (#25109)
Make DSL-hash skippable (#25140)
Streaming {Expr,LazyFrame}.rolling (#25058)
Set polars/<version> user-agent (#25112)
Add BIT_NOT support to the SQL interface (#25094)
Support BYTE_ARRAY backed Decimals in Parquet (#25076)
Add allow_empty flag to item (#25048)
Support ewm_mean() in streaming engine (#25003)
Improve row-count estimates (#24996)
Remove filtered scan paths in IR when possible (#24974)
Introduce remote Polars MCP server (#24977)
Allow local scans on polars cloud (configurable) (#24962)
Add Expr.item to strictly extract a single value from an expression (#24888)
Add environment variable to roundtrip empty struct in Parquet (#24914)
Add glob parameter to scan_ipc (#24898)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Add list.agg and arr.agg (#24790)
Implement {Expr,Series}.rolling_rank() (#24776)
Support MergeSorted in CSPE (#24805)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Recursively apply CSPE (#24798)
Add streaming engine per-node metrics (#24788)
Add arr.eval (#24472)
Improve rolling_(sum|mean) accuracy (#24743)
Add nth_set_bit_u64() with unit test (#24035)
Add separator to {Data,Lazy}Frame.unnest (#24716)
Add union() function for unordered concatenation (#24298)
Add name.replace to the set of column rename options (#17942)
Allow duration strings with leading "+" (#24737)
Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
Add support for UInt128 to pyo3-polars (#24731)
Implement maintain_order for cross join (#24665)
Add support to output dt.total_{}() duration values as fractionals (#24598)
Support scanning from file:/path URIs (#24603)
Log which file the schema was sourced from, and which file caused an extra column error (#24621)
Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)

🐞 Bug fixes

Fix CSV select(len()) off by 1 with comment prefix (#25069)
Fix incorrect reshape on sliced lists (#25139)
Support "index" as column name in group_by iterator (#25138)
DSL_SCHEMA_HASH should not changed by line endings (#25123)
Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
Fix panic in dt.truncate for invalid duration strings (#25124)
Don't trigger DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
Return the correct string-case Expr reprs (#25101)
Fix groups update on slices with different offsets (#25097)
Fix handling Null dtype in ApplyExpr on group_by (#25077)
Raise error for all/any on list instead of panic (#25018)
Unique key names in streaming sort/top_k (#25082)
The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
Fix panic if scan predicate produces 0 length mask (#25089)
Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
Panic in group_by_dynamic with group_by and multiple chunks (#25075)
Fix panic when using struct field as join key (#25059)
Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
Fix field metadata for nested categorical PyCapsule export (#25052)
Block predicate pushdown when group_by key values are changed (#25032)
Group-By aggregation problems caused by AmortSeries (#25043)
Don't push down predicates passed inserted cache nodes (#25042)
Allow for negative time in group_by_dynamic iterator (#25041)
Re-enable CPU feature check before import (#25010)
Correctness any(ignore_nulls) and OOB in all (#25005)
Streaming any/all with ignore_nulls=False (#25008)
Fix incorrect join_asof on a casted expression (#25006)
Optimize memory on rolling groups in ApplyExpr (#24709)
Fallback Pyarrow scan to in-memory engine (#24991)
Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
Capitalize letters after numbers in to_titlecase (#24993)
Preserve null values in pct_change (#24952)
Raise length mismatch on over with sliced groups (#24887)
Check duplicate name in transpose (#24956)
Follow Kleene logic in any / all for group-by (#24940)
Do not optimize cross join to iejoin if order maintaining (#24950)
Broadcast partition_by columns in over expression (#24874)
Clear index cache on stacked df.filter expressions (#24870)
Fix 'explode' mapping strategy on scalar value (#24861)
Fix repeated with_row_index() after scan() silently ignored (#24866)
Correctly return min and max for enums in groupby aggregation (#24808)
Refactor BinaryExpr in group_by dispatch logic (#24548)
Fix aggstate for gather (#24857)
Keep scalars for length preserving functions in group_by (#24819)
Have range feature depend on dtype-array feature (#24853)
Fix duplicate select panic (#24836)
Inconsistency of list.sum() result type with None values (#24476)
Division by zero in Expr.dt.truncate (#24832)
Potential deadlock in __arrow_c_stream__ (#24831)
Allow double aggregations in group-by contexts (#24823)
Series.shrink_dtype for i128/u128 (#24833)
Fix dtype in EvalExpr (#24650)
Allow aggregations on AggState::LiteralScalar (#24820)
Dispatch to group_aware for fallible expressions with masked out elements (#24815)
Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
Fix XOR did not follow kleene when one side is unit-length (#24810)
Incorrect precision in Series.str.to_decimal (#24804)
Use overlapping instead of rolling (#24787)
Fix iterable on dynamic_group_by and rolling object (#24740)
Use Kahan summation for in-memory groupby sum/mean (#24774)
Release GIL in PythonScan predicate evaluation (#24779)
Type error in bitmask::nth_set_bit_u64 (#24775)
Add Expr.sign for Decimal datatype (#24717)
Correct str.replace with missing pattern (#24768)
Support decimal_comma on Decimal type in write_csv (#24718)
Parse Decimal with comma as decimal separator in CSV (#24685)
Make Categories pickleable (#24691)
Shift on array within list (#24678)
Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
Support reading of mixed compressed/uncompressed IPC buffers (#24674)
Overflow in slice-slice optimization (#24658)
Package discovery for setuptools (#24656)
Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
Remove inclusion of polars dir in runtime sdist/wheel (#24654)
Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
Raise Exception instead of panic when unnest on non-struct column (#24471)
Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
Newline escaping in streaming show_graph (#24612)
Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
Sink batches early stop on in-memory engine (#24585)
More precisely model expression ordering requirements (#24437)
Panic in zero-weight rolling mean/var (#24596)
Decimal <-> literal arithmetic supertype rules (#24594)
Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
Validate list type for list expressions in planner (#24589)
Have log() prioritize the leftmost dtype for its output dtype (#24581)
CSV pl.len() was incorrect (#24587)
Add support for float inputs for duration types (#24529)
Roundtrip empty string through hive partitioning (#24546)
Fix potential OOB writes in unaligned IPC read (#24550)
Fix regression error when scanning AWS presigned URL (#24530)
Make PlPath::join for cloud paths replace on absolute paths (#24514)
Correct dtype for cum_agg in streaming engine (#24510)
Escape backslashes in EscapeLabel to produce valid DOT labels (#24532)

📖 Documentation

Mention Narwhals in ecosystem page (#25100)
Fix typo in public dataset URL (#25044)
Introduce remote Polars MCP server (#24977)
Update Cloud docs with correct fn argument order (#24939)
Add i128 and u128 features to user guide (#24938)
Relax fsspec wording (#24881)
Fix duplicated article in SECURITY.md (#24762)
Specify that precision=None becomes 38 for Decimal (#24742)
Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
Fix source mapping (#24736)
Fix syntax error in data-types-and-structures.md (#24606)

📦 Build system

Make building the docs on macOS more reliable (#25095)
Ensure build_feature_flags.py is included in artifact (#25024)
Python pre-release 1.34.0b5 (#24699)
Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

Support for named/anonymous aggregations (#25118)
Silence unused mut warning (#25093)
Remove old join projection pushdown logic (#25088)
Disable recursive CSPE for now (#25085)
Remove unused row-count (#25080)
Add proptest strategies for Series logical types (#24849)
Add stateful EwmCov kernel (#25065)
Add IR for scan_lines (#25066)
Change group length mismatch error to ShapeError (#25004)
Move asof tolerance type coercion to IR conversion (#25033)
Move EwmMeanState to polars-compute (#25034)
Update toolchain (#25007)
Fix benchmark ci (#25019)
Fix non-deterministic test (#25009)
Fix makefile arch detection (#25011)
Make LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
Update row estimation and reader schema in filter_scan_ir (#24995)
Insert casts for ewm_mean inputs in type coercion (#24992)
Remove unused expr_eval (#24988)
Remove symbolic links (#24982)
Add stateful EwmMean kernel (#24972)
Dispatch to no-op rayon thread-pool from streaming (#24957)
Add function to filter IR::Scan based on indices (#24979)
Organize code for opaque functions in a module (#24978)
Move scan filter code to polars-mem-engine (#24959)
Unpin pydantic (#24955)
Ensure safety of scan fast-count IR lowering in streaming (#24953)
Expose polars_compute from polars (#24556)
Re-use iterators in set_ operations (#24850)
Move order code to instance function (#24895)
Visualization data generator for streaming physical plan (#24896)
Remove GroupByPartitioned and dispatch to streaming engine (#24903)
Improve IR visualization for IEJoin (#24902)
Turn element() into {A,}Expr::Element (#24885)
Pass ScanOptions to new_from_ipc (#24893)
Update tests to be index type agnostic (#24891)
Remove legacy order_sensitive code (#24894)
Rename text_plan_graph to visualization_data (#24878)
Use UnifiedScanArgs in new_from_ipc and remove LazyIpcReader (#24883)
Document safety of CategoricalToArrowConverter (#24876)
Unset Context in Window expression (#24875)
Unify expression order resolution (#24723)
Move FunctionExpr dispatch from plan to expr (#24839)
Fix SQL test giving wrong error message (#24835)
Consolidate dtype paths in ApplyExpr (#24825)
Add days_in_month to documentation (#24822)
Enable ruff D417 lint (#24814)
Turn pl.format into proper elementwise expression (#24811)
Fix remote benchmark by no-longer saving builds (#24812)
Expose function on IPC writer to write dictionary batches (#24802)
Refactor ApplyExpr in group_by context on multiple inputs (#24520)
IR text plan graph generator (#24733)
Move Series to_arrow() logic to struct function (#24794)
Temporarily pin pydantic to fix CI (#24797)
Extend and rename rolling groups to overlapping (#24577)
Refactor DataType proptest strategies (#24763)
Add union to documentation (#24769)
Cleaner whitespace skipping in CSV field parser (#24705)
Remove duplicate maintain_order from CrossJoinOptions (#24725)
Change function order flags to be less error prone (#24604)
Remove {Upper,Lower}Bound expressions in IR (#24701)
Fix Makefile uv pip option syntax (#24711)
Add egg-info to gitignore (#24712)
Restructure python project directories again (#24676)
Use IR for polars-expr output field resolution (#24661)
Add proptest strategies for Series physical types (#24549)
Expose CloudScheme via polars::prelude (#24643)
Remove dist/ from release python workflow (#24639)
Escape sed ampersand in release script (#24631)
Remove PyOdide from release for now (#24630)
Fix sed in-place in release script (#24628)
Release script pyodide wheel (#24627)
Release script pyodide wheel (#24626)
Update release script for runtimes (#24610)
Remove tokio-util dependency (#24617)
Remove unused UnknownKind::Ufunc (#24614)
Use cargo-run to call dsl-schema script (#24607)
Genericize UnitVec for any T (#24597)
Cleanup and prepare to_field for element and struct field context (#24592)
Resolve nightly clippy hints (#24593)
Rename pl.dependencies to pl._dependencies (#24595)
More release scripting (#24582)
Again a minor fix for the setup script (#24580)
Minor fix in release script (#24579)
Correct release python beta version check (#24578)
Python dependency failure (#24576)
Always install yq (#24570)
Deterministic import order for Python Polars package variants (#24531)
Check Arrow FFI pointers with an assert (#24564)
Add CloudScheme::FileNoHostname variant (#24535)
Add a couple of missing type definitions in python (#24561)
Fix quickstart example in Polars Cloud user guide (#24554)
Add implementations for loading min/max statistics for Iceberg (#24496)
Move collapse_joins optimizer logic into predicate pushdown optimizer (#24495)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @EndPositive, @EnricoMi, @FBruzzesi, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Object905, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dangotbanned, @deanm0000, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @itamarst, @jan-krueger, @jordanosborn, @kdn36, @lzcmian, @math-hiyoko, @mcrumiller, @mjanssen, @moizescbf, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @thomasjpfan and @williambdean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rust Polars 0.52.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!