DBT python model passing date variables with different data types #583

tade0726 · 2024-02-09T07:12:50Z

Describe the feature

I noticed two data types arrived in the Python notebook for date str.

The python notebook mentioned here is the notebook dbt-databricks summits to Databricks for running python models. There is a config object in that, was defined corrodingly based on vars configured in dbt_project.yml .

As in :

# This part is user provided model code
# you will need to copy the next section to run the code
# COMMAND ----------
# this part is dbt logic for get ref work, do not modify

def ref(*args, **kwargs):
   refs = {}
   key = '.'.join(args)
   version = kwargs.get("v") or kwargs.get("version")
   if version:
       key += f".v{version}"
   dbt_load_df_function = kwargs.get("dbt_load_df_function")
   return dbt_load_df_function(refs[key])


def source(*args, dbt_load_df_function):
   sources = {"prod_staging.ciqfininstance": "`savana_prod`.`dbt_staging`.`ciqfininstance`"}
   key = '.'.join(args)
   return dbt_load_df_function(sources[key])


config_dict = {'start_date': datetime.date(2018, 7, 1), 'end_date': ''}


class config:
   def __init__(self, *args, **kwargs):
       pass

   @staticmethod
   def get(key, default=None):
       return config_dict.get(key, default)

class this:
   """dbt.this() or dbt.this.identifier"""
   database = "savana_prod"
   schema = "dbt_gold_pipelines"
   identifier = "int_date_table"
   
   def __repr__(self):
       return '`savana_prod`.`dbt_gold_pipelines`.`int_date_table`'


class dbtObj:
   def __init__(self, load_df_function) -> None:
       self.source = lambda *args: source(*args, dbt_load_df_function=load_df_function)
       self.ref = lambda *args, **kwargs: ref(*args, **kwargs, dbt_load_df_function=load_df_function)
       self.config = config
       self.this = this()
       self.is_incremental = False

# COMMAND ----------

# how to execute python model in notebook
# dbt = dbtObj(spark.table)
# df = model(dbt, spark)

Case 1:

When a date variable has been set up in dbt_project.yml, the Python notebook in Databricks will set that object as a DateTime object.

As in a config Python dictionary

{"start_date", datetime.date(2021, 1, 1)}

Case 2:

When a date variable passed with --vars "{'date': '2021-01-01'}" in dbt cmd ,

It arrives as

{"start_date", '2021-01-01'}

Describe alternatives you've considered

I suggest having just a string for the date object.

Additional context

I am implementing an integration with Dagster + Databricks cloud + dbt, and I need both kinds of configuration to work in the same expectation of returned variable; otherwise, it will be redundant code copying and pasting in different Python models for facilitating the logic to reconcile the difference.

As in dbt Python models can not reuse the same abstracted functions / etc for now.

Who will this benefit?

It will no longer require writing extra logic to distinguish the date types.

Integration with Dagster might depend on --var {} CLI to pass configuration. But, it would be easier to debug in the dbt project when we pass the identical variable setups with the yml file.

Are you interested in contributing to this feature?

Yes, I would love to.

The text was updated successfully, but these errors were encountered:

leo-schick · 2024-02-12T07:06:50Z

@tade0726 you can ensure that by doing a little code adjustment to case 1:

instead of

{"start_date", datetime.date(2021, 1, 1)}

write

{"start_date", datetime.date(2021, 1, 1).strftime("%d.%m.%Y")}

see also: Python Dates

tade0726 · 2024-02-12T07:41:40Z

Hi @leo-schick ,

It is not related to the user-defined dictionary. It was a config template generated by dbt-databricks.

I also updated the issue description for clarity.

See example below:

# This part is user provided model code
# you will need to copy the next section to run the code
# COMMAND ----------
# this part is dbt logic for get ref work, do not modify

def ref(*args, **kwargs):
    refs = {}
    key = '.'.join(args)
    version = kwargs.get("v") or kwargs.get("version")
    if version:
        key += f".v{version}"
    dbt_load_df_function = kwargs.get("dbt_load_df_function")
    return dbt_load_df_function(refs[key])


def source(*args, dbt_load_df_function):
    sources = {"prod_staging.ciqfininstance": "`savana_prod`.`dbt_staging`.`ciqfininstance`"}
    key = '.'.join(args)
    return dbt_load_df_function(sources[key])


config_dict = {'start_date': datetime.date(2018, 7, 1), 'end_date': ''}


class config:
    def __init__(self, *args, **kwargs):
        pass

    @staticmethod
    def get(key, default=None):
        return config_dict.get(key, default)

class this:
    """dbt.this() or dbt.this.identifier"""
    database = "savana_prod"
    schema = "dbt_gold_pipelines"
    identifier = "int_date_table"
    
    def __repr__(self):
        return '`savana_prod`.`dbt_gold_pipelines`.`int_date_table`'


class dbtObj:
    def __init__(self, load_df_function) -> None:
        self.source = lambda *args: source(*args, dbt_load_df_function=load_df_function)
        self.ref = lambda *args, **kwargs: ref(*args, **kwargs, dbt_load_df_function=load_df_function)
        self.config = config
        self.this = this()
        self.is_incremental = False

# COMMAND ----------

# how to execute python model in notebook
# dbt = dbtObj(spark.table)
# df = model(dbt, spark)

benc-db · 2024-02-12T16:51:10Z

@tade0726 Please go ahead and submit a PR :). I agree that it makes most sense for the data type to be consistent regardless of source.

tade0726 added the enhancement New feature or request label Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBT python model passing date variables with different data types #583

DBT python model passing date variables with different data types #583

tade0726 commented Feb 9, 2024 •

edited

leo-schick commented Feb 12, 2024

tade0726 commented Feb 12, 2024 •

edited

benc-db commented Feb 12, 2024

DBT python model passing date variables with different data types #583

DBT python model passing date variables with different data types #583

Comments

tade0726 commented Feb 9, 2024 • edited

Describe the feature

Case 1:

Case 2:

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing to this feature?

leo-schick commented Feb 12, 2024

tade0726 commented Feb 12, 2024 • edited

benc-db commented Feb 12, 2024

tade0726 commented Feb 9, 2024 •

edited

tade0726 commented Feb 12, 2024 •

edited