bugfix/cast_strings_as_floats#33
Conversation
fivetran-avinash
left a comment
There was a problem hiding this comment.
@fivetran-savage Sent some notes back, let me know if you have questions!
| {% set string_dtypes = ['char', 'text', 'string'] %} | ||
| {% for col in adapter.get_columns_in_relation(ref('stg_workday__worker_base')) %} | ||
| {% if col.name.lower() in ['annual_currency_summary_primary_compensation_basis', 'annual_currency_summary_total_base_pay', 'annual_currency_summary_total_salary_and_allowances'] %} | ||
| {% if target.type == 'databricks' %} |
There was a problem hiding this comment.
I think there is a silent failure when running against databricks here. Variables set inside a {% for %} loop don't propagate out of it in Jinja2. So {% set is_str = true %} inside the {% for stype in string_dtypes %} loop has no effect, and is_str always stays false after the loop ends. This means string-typed compensation columns on Databricks are never cast to float and silently land as strings in the output table.
I tested with this configuration in integration_tests/dbt_project.yml
annual_currency_summary_primary_compensation_basis: "{{ 'varchar(100)' if target.type in ('redshift', 'postgres') else 'string' }}"
annual_currency_summary_total_base_pay: "{{ 'varchar(100)' if target.type in ('redshift', 'postgres') else 'string' }}"
annual_currency_summary_total_salary_and_allowances: "{{ 'varchar(100)' if target.type in ('redshift', 'postgres') else 'string' }}"
And when running
dbt compile -s stg_workday__worker_history -t databricks --vars '{"employee_history_enabled": true}'
You end up seeing that these fields are not cast to floats and remain as strings.
So the Databricks case isn't quite solved as of yet. I wonder if we can use the Jinja namespace capability, to do the data type check, and then change the value to true that way. Might be worth testing.
| [PR #33](https://github.com/fivetran/dbt_workday/pull/33) includes the following update. | ||
|
|
||
| ## Bug Fix | ||
| - Fixes an issue in `stg_workday__worker_history` where `annual_currency_summary_primary_compensation_basis`, `annual_currency_summary_total_base_pay`, and `annual_currency_summary_total_salary_and_allowances` could surface as string datatypes. These columns are now cast to float when the source type is a string. |
There was a problem hiding this comment.
These fields are also present in the base stg_workday__worker model. Should we apply the macro there as well?
There was a problem hiding this comment.
per this comment in the Jira ticket, the decision was to only apply to the worker_history table.
| annual_currency_summary_primary_compensation_basis, | ||
| annual_currency_summary_total_base_pay, | ||
| annual_currency_summary_total_salary_and_allowances, | ||
| {% set string_dtypes = ['char', 'text', 'string'] %} |
There was a problem hiding this comment.
I don't think 'text' is a string data type in Databricks. It's harmless if it's there, but might be worth double-checking if it's safe to remove.
PR Overview
Package version introduced in this PR:
This PR addresses the following Issue/Feature(s):
Summary of changes:
stg_workday__worker_historywhereannual_currency_summary_primary_compensation_basis,annual_currency_summary_total_base_pay, andannual_currency_summary_total_salary_and_allowancescould surface as string datatypes by casting as float.Submission Checklist
Changelog