Skip to content

from_dicts: strange interaction between schema_overrides and schema inference causing SchemaError #25232

@oefe

Description

@oefe

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

dicts = [
    {
        "a": 0,
        "b": 0.64134,
    },
    {
        "b": [0.522672, 0.706087],
    },
]
override_schema = {
    "b": pl.List(pl.Float64),
}

df = pl.from_dicts(dicts, schema_overrides=override_schema)
print(df.schema)

Log output

Traceback (most recent call last):
  File "/Users/martinaoefelein/Repos/photos/polars-bug.py", line 23, in <module>
    df = pl.from_dicts(dicts, schema_overrides=override_schema)
  File "/Users/martinaoefelein/.cache/uv/environments-v2/polars-bug-a5544cefa72fa01d/lib/python3.13/site-packages/polars/convert/general.py", line 217, in from_dicts
    return pl.DataFrame(
           ~~~~~~~~~~~~^
        data,
        ^^^^^
    ...<3 lines>...
        infer_schema_length=infer_schema_length,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/martinaoefelein/.cache/uv/environments-v2/polars-bug-a5544cefa72fa01d/lib/python3.13/site-packages/polars/dataframe/frame.py", line 386, in __init__
    self._df = sequence_to_pydf(
               ~~~~~~~~~~~~~~~~^
        data,
        ^^^^^
    ...<5 lines>...
        nan_to_null=nan_to_null,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/martinaoefelein/.cache/uv/environments-v2/polars-bug-a5544cefa72fa01d/lib/python3.13/site-packages/polars/_utils/construction/dataframe.py", line 466, in sequence_to_pydf
    return _sequence_to_pydf_dispatcher(
        get_first_non_none(data),
    ...<6 lines>...
        nan_to_null=nan_to_null,
    )
  File "/Users/martinaoefelein/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/functools.py", line 929, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/Users/martinaoefelein/.cache/uv/environments-v2/polars-bug-a5544cefa72fa01d/lib/python3.13/site-packages/polars/_utils/construction/dataframe.py", line 722, in _sequence_of_dict_to_pydf
    pydf = PyDataFrame.from_dicts(
        data,
    ...<3 lines>...
        infer_schema_length=infer_schema_length,
    )
polars.exceptions.SchemaError: failed to determine supertype of f64 and list[f64]

Issue description

The issue goes away if I remove the entry for "a" from dicts.
It also goes away if I add a schema definition for "a" to override_schema .

Background

I'm trying to load a sequence of dictionaries into a polars dataframe.

The schemas of the dictionaries are somewhat inconsistent (they represent EXIF data from various camera types, and each brand has their own extensions and quirks). There are over 1000 different keys, and new ones are added frequently, so I want to use Polars' schema inference to determine the schema.

However, some keys (like "b" in the example) can have either scalar or list values, so I need to tell Polars with a schema_overrides dict to treat these as lists.

Each of these techniques (schema inference and schema override) works on its own, but combined, they don't seem to work as expected

Expected behavior

Dict entries with keys listed in schema_overrides are mapped to series according to the entry in schema_overrides.

All other entries are mapped to series based on type inference.

These two mechanisms are independent and shouldn't interfere with each other.

Installed versions

--------Version info---------
Polars:              1.35.1
Index type:          UInt32
Platform:            macOS-15.6-arm64-arm-64bit-Mach-O
Python:              3.13.0 (main, Oct 16 2024, 08:05:40) [Clang 18.1.8 ]
Runtime:             rt32

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                2.3.4
openpyxl             <not installed>
pandas               <not installed>
polars_cloud         <not installed>
pyarrow              <not installed>
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageAwaiting prioritization by a maintainerpythonRelated to Python Polars

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions