Rust Polars 0.52.0
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Lazy gather for
{forward,backward}_fillin group-by contexts (#25115) - Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Skip filtering scan IR if no paths were filtered (#25037)
- Optimize ipc stream read performance (#24671)
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
uniqueto native group-by and speed upn_uniquein group-by context (#24976) - Better parallelize
take{_slice,}_unchecked(#24980) - Implement native
skewandkurtosisin group-by context (#24961) - Use native group-by aggregations for
bitwise_*operations (#24935) - Address
group_by_dynamicslowness in sparse data (#24916) - Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
- Optimize gather_every(n=1) to slice (#24704)
- Lower null count to streaming engine (#24703)
- Native streaming
gather_every(#24700) - Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
✨ Enhancements
- Improve error message on unsupported SQL subquery comparisons (#25135)
- Rewrite
IR::ScantoIR::DataFrameScaninexpand_datasetswhen applicable (#25106) - Support
ewm_var/stdin streaming engine (#25109) - Make DSL-hash skippable (#25140)
- Streaming
{Expr,LazyFrame}.rolling(#25058) - Set polars/<version> user-agent (#25112)
- Add
BIT_NOTsupport to the SQL interface (#25094) - Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Add
allow_emptyflag toitem(#25048) - Support
ewm_mean()in streaming engine (#25003) - Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.itemto strictly extract a single value from an expression (#24888) - Add environment variable to roundtrip empty struct in Parquet (#24914)
- Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
nth_set_bit_u64()with unit test (#24035) - Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
🐞 Bug fixes
- Fix CSV
select(len())off by 1 with comment prefix (#25069) - Fix incorrect reshape on sliced lists (#25139)
- Support "index" as column name in
group_byiterator (#25138) - DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Fix panic in
dt.truncatefor invalid duration strings (#25124) - Don't trigger
DeprecationWarningfrom SQL "IN" constraints that use subqueries (#25111) - Return the correct string-case
Exprreprs (#25101) - Fix
groupsupdate on slices with different offsets (#25097) - Fix handling
Nulldtype inApplyExprongroup_by(#25077) - Raise error for all/any on list instead of panic (#25018)
- Unique key names in streaming sort/top_k (#25082)
- The
SQLinterface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091) - Fix panic if scan predicate produces 0 length mask (#25089)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Panic in
group_by_dynamicwithgroup_byand multiple chunks (#25075) - Fix panic when using struct field as join key (#25059)
- Allow broadcast in
group_byforApplyExprandBinaryExpr(#25053) - Fix field metadata for nested categorical PyCapsule export (#25052)
- Block predicate pushdown when
group_bykey values are changed (#25032) - Group-By aggregation problems caused by
AmortSeries(#25043) - Don't push down predicates passed inserted cache nodes (#25042)
- Allow for negative time in
group_by_dynamiciterator (#25041) - Re-enable CPU feature check before import (#25010)
- Correctness
any(ignore_nulls)and OOB inall(#25005) - Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asofon a casted expression (#25006) - Optimize memory on rolling groups in
ApplyExpr(#24709) - Fallback
Pyarrowscan to in-memory engine (#24991) - Make
Operator::swap_operandsreturn correct operators forPlus,Minus,MultiplyandDivide(#24997) - Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change(#24952) - Raise length mismatch on
overwith sliced groups (#24887) - Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any/allfor group-by (#24940) - Do not optimize cross join to iejoin if order maintaining (#24950)
- Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Support
decimal_commaonDecimaltype inwrite_csv(#24718) - Parse
Decimalwith comma as decimal separator in CSV (#24685) - Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Escape backslashes in EscapeLabel to produce valid DOT labels (#24532)
📖 Documentation
- Mention Narwhals in ecosystem page (#25100)
- Fix typo in public dataset URL (#25044)
- Introduce remote Polars MCP server (#24977)
- Update Cloud docs with correct fn argument order (#24939)
- Add i128 and u128 features to user guide (#24938)
- Relax fsspec wording (#24881)
- Fix duplicated article in SECURITY.md (#24762)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
- Fix syntax error in data-types-and-structures.md (#24606)
📦 Build system
- Make building the docs on macOS more reliable (#25095)
- Ensure
build_feature_flags.pyis included in artifact (#25024) - Python pre-release 1.34.0b5 (#24699)
- Use cargo-run to call dsl-schema script (#24607)
🛠️ Other improvements
- Support for named/anonymous aggregations (#25118)
- Silence unused mut warning (#25093)
- Remove old join projection pushdown logic (#25088)
- Disable recursive CSPE for now (#25085)
- Remove unused row-count (#25080)
- Add
propteststrategies for Series logical types (#24849) - Add stateful
EwmCovkernel (#25065) - Add IR for
scan_lines(#25066) - Change group length mismatch error to
ShapeError(#25004) - Move asof
tolerancetype coercion to IR conversion (#25033) - Move
EwmMeanStatetopolars-compute(#25034) - Update toolchain (#25007)
- Fix benchmark ci (#25019)
- Fix non-deterministic test (#25009)
- Fix makefile arch detection (#25011)
- Make
LazyFrame.set_sortedinto aFunctionIR::Hint(#24981) - Update row estimation and reader schema in
filter_scan_ir(#24995) - Insert casts for
ewm_meaninputs in type coercion (#24992) - Remove unused
expr_eval(#24988) - Remove symbolic links (#24982)
- Add stateful
EwmMeankernel (#24972) - Dispatch to no-op rayon thread-pool from streaming (#24957)
- Add function to filter
IR::Scanbased on indices (#24979) - Organize code for opaque functions in a module (#24978)
- Move scan filter code to
polars-mem-engine(#24959) - Unpin pydantic (#24955)
- Ensure safety of scan fast-count IR lowering in streaming (#24953)
- Expose
polars_computefrom polars (#24556) - Re-use iterators in
set_operations (#24850) - Move order code to instance function (#24895)
- Visualization data generator for streaming physical plan (#24896)
- Remove
GroupByPartitionedand dispatch to streaming engine (#24903) - Improve IR visualization for IEJoin (#24902)
- Turn
element()into{A,}Expr::Element(#24885) - Pass
ScanOptionstonew_from_ipc(#24893) - Update tests to be index type agnostic (#24891)
- Remove legacy
order_sensitivecode (#24894) - Rename
text_plan_graphtovisualization_data(#24878) - Use
UnifiedScanArgsinnew_from_ipcand removeLazyIpcReader(#24883) - Document safety of
CategoricalToArrowConverter(#24876) - Unset
ContextinWindowexpression (#24875) - Unify expression order resolution (#24723)
- Move
FunctionExprdispatch fromplantoexpr(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr(#24825) - Add
days_in_monthto documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.formatinto proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Expose function on IPC writer to write dictionary batches (#24802)
- Refactor
ApplyExpringroup_bycontext on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Move Series
to_arrow()logic to struct function (#24794) - Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rollinggroups tooverlapping(#24577) - Refactor
DataTypepropteststrategies (#24763) - Add
unionto documentation (#24769) - Cleaner whitespace skipping in CSV field parser (#24705)
- Remove duplicate maintain_order from CrossJoinOptions (#24725)
- Change function order flags to be less error prone (#24604)
- Remove
{Upper,Lower}Boundexpressions in IR (#24701) - Fix Makefile
uv pipoption syntax (#24711) - Add egg-info to gitignore (#24712)
- Restructure python project directories again (#24676)
- Use IR for
polars-exproutput field resolution (#24661) - Add
propteststrategies for Series physical types (#24549) - Expose
CloudSchemeviapolars::prelude(#24643) - Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove tokio-util dependency (#24617)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Genericize UnitVec for any T (#24597)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add
CloudScheme::FileNoHostnamevariant (#24535) - Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Move collapse_joins optimizer logic into predicate pushdown optimizer (#24495)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EndPositive, @EnricoMi, @FBruzzesi, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Object905, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dangotbanned, @deanm0000, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @itamarst, @jan-krueger, @jordanosborn, @kdn36, @lzcmian, @math-hiyoko, @mcrumiller, @mjanssen, @moizescbf, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @thomasjpfan and @williambdean