Skip to content

Rust Polars 0.52.0

Choose a tag to compare

@github-actions github-actions released this 03 Nov 15:18
ed23bd6

🏆 Highlights

  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

  • Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
  • Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
  • Skip filtering scan IR if no paths were filtered (#25037)
  • Optimize ipc stream read performance (#24671)
  • Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
  • Lower unique to native group-by and speed up n_unique in group-by context (#24976)
  • Better parallelize take{_slice,}_unchecked (#24980)
  • Implement native skew and kurtosis in group-by context (#24961)
  • Use native group-by aggregations for bitwise_* operations (#24935)
  • Address group_by_dynamic slowness in sparse data (#24916)
  • Native filter/drop_nulls/drop_nans in group-by context (#24897)
  • Implement cumulative_eval using the group-by engine (#24889)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Implement native null_count, any and all group-by aggregations (#24859)
  • Speed up reverse in group-by context (#24855)
  • Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
  • Don't check duplicates on streaming simple projection in release mode (#24830)
  • Lower approx_n_unique to the streaming engine (#24821)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
  • Implement indexed method for BitMapIter::nth (#24766)
  • Pushdown slices on plans within unions (#24735)
  • Optimize gather_every(n=1) to slice (#24704)
  • Lower null count to streaming engine (#24703)
  • Native streaming gather_every (#24700)
  • Pushdown filter with strptime if input is literal (#24694)
  • Avoid copying expanded paths (#24669)
  • Relax filter expr ordering (#24662)
  • Remove unnecessary groups call in aggregated (#24651)
  • Skip files in scan_iceberg with filter based on metadata statistics (#24547)
  • Push row_index predicate for all scan types (#24537)
  • Perform integer in-filtering for Parquet inequality predicates (#24525)
  • Stop caching Parquet metadata after 8 files (#24513)

✨ Enhancements

  • Improve error message on unsupported SQL subquery comparisons (#25135)
  • Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
  • Support ewm_var/std in streaming engine (#25109)
  • Make DSL-hash skippable (#25140)
  • Streaming {Expr,LazyFrame}.rolling (#25058)
  • Set polars/<version> user-agent (#25112)
  • Add BIT_NOT support to the SQL interface (#25094)
  • Support BYTE_ARRAY backed Decimals in Parquet (#25076)
  • Add allow_empty flag to item (#25048)
  • Support ewm_mean() in streaming engine (#25003)
  • Improve row-count estimates (#24996)
  • Remove filtered scan paths in IR when possible (#24974)
  • Introduce remote Polars MCP server (#24977)
  • Allow local scans on polars cloud (configurable) (#24962)
  • Add Expr.item to strictly extract a single value from an expression (#24888)
  • Add environment variable to roundtrip empty struct in Parquet (#24914)
  • Add glob parameter to scan_ipc (#24898)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Add list.agg and arr.agg (#24790)
  • Implement {Expr,Series}.rolling_rank() (#24776)
  • Support MergeSorted in CSPE (#24805)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Recursively apply CSPE (#24798)
  • Add streaming engine per-node metrics (#24788)
  • Add arr.eval (#24472)
  • Improve rolling_(sum|mean) accuracy (#24743)
  • Add nth_set_bit_u64() with unit test (#24035)
  • Add separator to {Data,Lazy}Frame.unnest (#24716)
  • Add union() function for unordered concatenation (#24298)
  • Add name.replace to the set of column rename options (#17942)
  • Allow duration strings with leading "+" (#24737)
  • Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
  • Add support for UInt128 to pyo3-polars (#24731)
  • Implement maintain_order for cross join (#24665)
  • Add support to output dt.total_{}() duration values as fractionals (#24598)
  • Support scanning from file:/path URIs (#24603)
  • Log which file the schema was sourced from, and which file caused an extra column error (#24621)
  • Add LazyFrame.{sink,collect}_batches (#23980)
  • Deterministic import order for Python Polars package variants (#24531)
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
  • Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
  • Use fixed-scale Decimals (#24542)
  • Add support for unsigned 128-bit integers (#24346)

🐞 Bug fixes

  • Fix CSV select(len()) off by 1 with comment prefix (#25069)
  • Fix incorrect reshape on sliced lists (#25139)
  • Support "index" as column name in group_by iterator (#25138)
  • DSL_SCHEMA_HASH should not changed by line endings (#25123)
  • Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
  • Fix panic in dt.truncate for invalid duration strings (#25124)
  • Don't trigger DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
  • Return the correct string-case Expr reprs (#25101)
  • Fix groups update on slices with different offsets (#25097)
  • Fix handling Null dtype in ApplyExpr on group_by (#25077)
  • Raise error for all/any on list instead of panic (#25018)
  • Unique key names in streaming sort/top_k (#25082)
  • The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
  • Fix panic if scan predicate produces 0 length mask (#25089)
  • Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
  • Panic in group_by_dynamic with group_by and multiple chunks (#25075)
  • Fix panic when using struct field as join key (#25059)
  • Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
  • Fix field metadata for nested categorical PyCapsule export (#25052)
  • Block predicate pushdown when group_by key values are changed (#25032)
  • Group-By aggregation problems caused by AmortSeries (#25043)
  • Don't push down predicates passed inserted cache nodes (#25042)
  • Allow for negative time in group_by_dynamic iterator (#25041)
  • Re-enable CPU feature check before import (#25010)
  • Correctness any(ignore_nulls) and OOB in all (#25005)
  • Streaming any/all with ignore_nulls=False (#25008)
  • Fix incorrect join_asof on a casted expression (#25006)
  • Optimize memory on rolling groups in ApplyExpr (#24709)
  • Fallback Pyarrow scan to in-memory engine (#24991)
  • Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
  • Capitalize letters after numbers in to_titlecase (#24993)
  • Preserve null values in pct_change (#24952)
  • Raise length mismatch on over with sliced groups (#24887)
  • Check duplicate name in transpose (#24956)
  • Follow Kleene logic in any / all for group-by (#24940)
  • Do not optimize cross join to iejoin if order maintaining (#24950)
  • Broadcast partition_by columns in over expression (#24874)
  • Clear index cache on stacked df.filter expressions (#24870)
  • Fix 'explode' mapping strategy on scalar value (#24861)
  • Fix repeated with_row_index() after scan() silently ignored (#24866)
  • Correctly return min and max for enums in groupby aggregation (#24808)
  • Refactor BinaryExpr in group_by dispatch logic (#24548)
  • Fix aggstate for gather (#24857)
  • Keep scalars for length preserving functions in group_by (#24819)
  • Have range feature depend on dtype-array feature (#24853)
  • Fix duplicate select panic (#24836)
  • Inconsistency of list.sum() result type with None values (#24476)
  • Division by zero in Expr.dt.truncate (#24832)
  • Potential deadlock in __arrow_c_stream__ (#24831)
  • Allow double aggregations in group-by contexts (#24823)
  • Series.shrink_dtype for i128/u128 (#24833)
  • Fix dtype in EvalExpr (#24650)
  • Allow aggregations on AggState::LiteralScalar (#24820)
  • Dispatch to group_aware for fallible expressions with masked out elements (#24815)
  • Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
  • Fix XOR did not follow kleene when one side is unit-length (#24810)
  • Incorrect precision in Series.str.to_decimal (#24804)
  • Use overlapping instead of rolling (#24787)
  • Fix iterable on dynamic_group_by and rolling object (#24740)
  • Use Kahan summation for in-memory groupby sum/mean (#24774)
  • Release GIL in PythonScan predicate evaluation (#24779)
  • Type error in bitmask::nth_set_bit_u64 (#24775)
  • Add Expr.sign for Decimal datatype (#24717)
  • Correct str.replace with missing pattern (#24768)
  • Support decimal_comma on Decimal type in write_csv (#24718)
  • Parse Decimal with comma as decimal separator in CSV (#24685)
  • Make Categories pickleable (#24691)
  • Shift on array within list (#24678)
  • Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
  • Support reading of mixed compressed/uncompressed IPC buffers (#24674)
  • Overflow in slice-slice optimization (#24658)
  • Package discovery for setuptools (#24656)
  • Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
  • Remove inclusion of polars dir in runtime sdist/wheel (#24654)
  • Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
  • Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
  • Raise Exception instead of panic when unnest on non-struct column (#24471)
  • Include missing feature dependency from polars-stream/diff to polars-plan/abs (#24613)
  • Newline escaping in streaming show_graph (#24612)
  • Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first (#24591)
  • Sink batches early stop on in-memory engine (#24585)
  • More precisely model expression ordering requirements (#24437)
  • Panic in zero-weight rolling mean/var (#24596)
  • Decimal <-> literal arithmetic supertype rules (#24594)
  • Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
  • Validate list type for list expressions in planner (#24589)
  • Have log() prioritize the leftmost dtype for its output dtype (#24581)
  • CSV pl.len() was incorrect (#24587)
  • Add support for float inputs for duration types (#24529)
  • Roundtrip empty string through hive partitioning (#24546)
  • Fix potential OOB writes in unaligned IPC read (#24550)
  • Fix regression error when scanning AWS presigned URL (#24530)
  • Make PlPath::join for cloud paths replace on absolute paths (#24514)
  • Correct dtype for cum_agg in streaming engine (#24510)
  • Escape backslashes in EscapeLabel to produce valid DOT labels (#24532)

📖 Documentation

  • Mention Narwhals in ecosystem page (#25100)
  • Fix typo in public dataset URL (#25044)
  • Introduce remote Polars MCP server (#24977)
  • Update Cloud docs with correct fn argument order (#24939)
  • Add i128 and u128 features to user guide (#24938)
  • Relax fsspec wording (#24881)
  • Fix duplicated article in SECURITY.md (#24762)
  • Specify that precision=None becomes 38 for Decimal (#24742)
  • Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
  • Fix source mapping (#24736)
  • Fix syntax error in data-types-and-structures.md (#24606)

📦 Build system

  • Make building the docs on macOS more reliable (#25095)
  • Ensure build_feature_flags.py is included in artifact (#25024)
  • Python pre-release 1.34.0b5 (#24699)
  • Use cargo-run to call dsl-schema script (#24607)

🛠️ Other improvements

  • Support for named/anonymous aggregations (#25118)
  • Silence unused mut warning (#25093)
  • Remove old join projection pushdown logic (#25088)
  • Disable recursive CSPE for now (#25085)
  • Remove unused row-count (#25080)
  • Add proptest strategies for Series logical types (#24849)
  • Add stateful EwmCov kernel (#25065)
  • Add IR for scan_lines (#25066)
  • Change group length mismatch error to ShapeError (#25004)
  • Move asof tolerance type coercion to IR conversion (#25033)
  • Move EwmMeanState to polars-compute (#25034)
  • Update toolchain (#25007)
  • Fix benchmark ci (#25019)
  • Fix non-deterministic test (#25009)
  • Fix makefile arch detection (#25011)
  • Make LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
  • Update row estimation and reader schema in filter_scan_ir (#24995)
  • Insert casts for ewm_mean inputs in type coercion (#24992)
  • Remove unused expr_eval (#24988)
  • Remove symbolic links (#24982)
  • Add stateful EwmMean kernel (#24972)
  • Dispatch to no-op rayon thread-pool from streaming (#24957)
  • Add function to filter IR::Scan based on indices (#24979)
  • Organize code for opaque functions in a module (#24978)
  • Move scan filter code to polars-mem-engine (#24959)
  • Unpin pydantic (#24955)
  • Ensure safety of scan fast-count IR lowering in streaming (#24953)
  • Expose polars_compute from polars (#24556)
  • Re-use iterators in set_ operations (#24850)
  • Move order code to instance function (#24895)
  • Visualization data generator for streaming physical plan (#24896)
  • Remove GroupByPartitioned and dispatch to streaming engine (#24903)
  • Improve IR visualization for IEJoin (#24902)
  • Turn element() into {A,}Expr::Element (#24885)
  • Pass ScanOptions to new_from_ipc (#24893)
  • Update tests to be index type agnostic (#24891)
  • Remove legacy order_sensitive code (#24894)
  • Rename text_plan_graph to visualization_data (#24878)
  • Use UnifiedScanArgs in new_from_ipc and remove LazyIpcReader (#24883)
  • Document safety of CategoricalToArrowConverter (#24876)
  • Unset Context in Window expression (#24875)
  • Unify expression order resolution (#24723)
  • Move FunctionExpr dispatch from plan to expr (#24839)
  • Fix SQL test giving wrong error message (#24835)
  • Consolidate dtype paths in ApplyExpr (#24825)
  • Add days_in_month to documentation (#24822)
  • Enable ruff D417 lint (#24814)
  • Turn pl.format into proper elementwise expression (#24811)
  • Fix remote benchmark by no-longer saving builds (#24812)
  • Expose function on IPC writer to write dictionary batches (#24802)
  • Refactor ApplyExpr in group_by context on multiple inputs (#24520)
  • IR text plan graph generator (#24733)
  • Move Series to_arrow() logic to struct function (#24794)
  • Temporarily pin pydantic to fix CI (#24797)
  • Extend and rename rolling groups to overlapping (#24577)
  • Refactor DataType proptest strategies (#24763)
  • Add union to documentation (#24769)
  • Cleaner whitespace skipping in CSV field parser (#24705)
  • Remove duplicate maintain_order from CrossJoinOptions (#24725)
  • Change function order flags to be less error prone (#24604)
  • Remove {Upper,Lower}Bound expressions in IR (#24701)
  • Fix Makefile uv pip option syntax (#24711)
  • Add egg-info to gitignore (#24712)
  • Restructure python project directories again (#24676)
  • Use IR for polars-expr output field resolution (#24661)
  • Add proptest strategies for Series physical types (#24549)
  • Expose CloudScheme via polars::prelude (#24643)
  • Remove dist/ from release python workflow (#24639)
  • Escape sed ampersand in release script (#24631)
  • Remove PyOdide from release for now (#24630)
  • Fix sed in-place in release script (#24628)
  • Release script pyodide wheel (#24627)
  • Release script pyodide wheel (#24626)
  • Update release script for runtimes (#24610)
  • Remove tokio-util dependency (#24617)
  • Remove unused UnknownKind::Ufunc (#24614)
  • Use cargo-run to call dsl-schema script (#24607)
  • Genericize UnitVec for any T (#24597)
  • Cleanup and prepare to_field for element and struct field context (#24592)
  • Resolve nightly clippy hints (#24593)
  • Rename pl.dependencies to pl._dependencies (#24595)
  • More release scripting (#24582)
  • Again a minor fix for the setup script (#24580)
  • Minor fix in release script (#24579)
  • Correct release python beta version check (#24578)
  • Python dependency failure (#24576)
  • Always install yq (#24570)
  • Deterministic import order for Python Polars package variants (#24531)
  • Check Arrow FFI pointers with an assert (#24564)
  • Add CloudScheme::FileNoHostname variant (#24535)
  • Add a couple of missing type definitions in python (#24561)
  • Fix quickstart example in Polars Cloud user guide (#24554)
  • Add implementations for loading min/max statistics for Iceberg (#24496)
  • Move collapse_joins optimizer logic into predicate pushdown optimizer (#24495)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @EndPositive, @EnricoMi, @FBruzzesi, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Object905, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dangotbanned, @deanm0000, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @itamarst, @jan-krueger, @jordanosborn, @kdn36, @lzcmian, @math-hiyoko, @mcrumiller, @mjanssen, @moizescbf, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @thomasjpfan and @williambdean