Skip to content

dbt/Snowflake identifier casing mismatch causes test failures #1244

@EckisWelt

Description

@EckisWelt

Problem

The standard dbt + Snowflake integration has a built-in identifier casing mismatch that the datacontract CLI does not currently handle.

dbt default: normalizes all identifiers to lowercase in schema.yml (e.g. dim_customer, customer_key).

Snowflake default: stores and resolves all unquoted identifiers as UPPERCASE (e.g. DIM_CUSTOMER, CUSTOMER_KEY).

This is well-documented, expected behavior of both tools — but the datacontract CLI does not bridge the gap, causing test failures in what is otherwise a completely standard setup.

Failing Scenario 1 — Import with default casing

Standard dbt model:

# dbt schema.yml
models:
  - name: dim_customer
    description: "Customer dimension with denormalized nation and region"
    columns:
      - name: customer_key
        description: "Surrogate-compatible primary key (source: c_custkey)"
        data_tests:
          - unique
          - not_null
      - name: customer_name
        description: "Customer name"
        data_tests:
          - not_null

After running datacontract import dbt the importer produces lowercase identifiers as-is:

# datacontract.yaml (after import)
schema:
  - name: dim_customer
    physicalType: view
    description: Customer dimension with denormalized nation and region
    logicalType: object
    physicalName: dim_customer
    properties:
      - name: customer_key
        description: 'Surrogate-compatible primary key (source: c_custkey)'
        primaryKey: true
        primaryKeyPosition: 1
        logicalType: string
        required: true
        unique: true
      - name: customer_name
        description: Customer name
        logicalType: string
        required: true

Running datacontract test against Snowflake fails because Snowflake resolves dim_customer as the quoted identifier "dim_customer", which does not exist — the actual object is DIM_CUSTOMER. The physicalName of the object is in lowercase which actually should be in uppercase.

Failing Scenario 2 — Manually setting physicalName to uppercase

A natural workaround is to manually override physicalName with the uppercase form:

schema:
  - name: dim_customer
    physicalType: view
    description: Customer dimension with denormalized nation and region
    logicalType: object
    physicalName: DIM_CUSTOMER
    properties:
      - name: customer_key
        physicalName: CUSTOMER_KEY
        description: 'Surrogate-compatible primary key (source: c_custkey)'
        primaryKey: true
        primaryKeyPosition: 1
        logicalType: string
        required: true
        unique: true
      - name: customer_name
        physicalName: CUSTOMER_NAME
        description: Customer name
        logicalType: string
        required: true

This also fails. The test engine uses in the most cases the logical name field to resolve identifiers rather than physicalName, so the uppercase overrides are silently ignored and the same casing mismatch occurs.

Proposed Solutions

Option A — casing parameter in server configuration

Add a casing parameter to the server configuration that controls how identifiers are resolved when running tests. Possible values:

  • preserve — keep identifiers as-is (current behavior)
  • upper — convert all identifiers to UPPERCASE (Snowflake default)
  • lower — convert all identifiers to lowercase (dbt default)

Example configuration:

servers:
  - type: snowflake
    server: production
    account: my_account
    database: my_database
    schema: my_schema
    casing: upper  # proposed new field

With casing: upper configured, the test engine applies uppercase conversion to all resolved identifiers before querying Snowflake. Contracts authored with lowercase names (as imported from dbt) validate correctly without any manual overrides.

Trade-off: The casing rule is defined at the server level and applies uniformly to all identifiers. This matches the Snowflake default but cannot accommodate mixed-case objects (e.g. quoted identifiers with intentional mixed case).

Note: A related fix was attempted in #797 but made the situation worse.

Option B — Honor physicalName in all tests

Fix the test engine to use physicalName when it is set, falling back to name when it is not. This makes the existing physicalName field functional for test resolution, which is the natural expectation given its presence in the schema.

With this fix, the manually-overridden contract from Scenario 2 would work as intended:

schema:
  - name: dim_customer
    physicalName: DIM_CUSTOMER
    properties:
      - name: customer_key
        physicalName: CUSTOMER_KEY
      - name: customer_name
        physicalName: CUSTOMER_NAME

Trade-off: Requires manually setting physicalName on every model and column after import, which is tedious at scale. It also means the import step alone does not produce a working contract — additional manual editing is always required for Snowflake targets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    drop-soda-firstRelevant after removing the Soda v3 dependency

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions