Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

--dbt is stuck with --json flag #874

Closed
harikaduyu opened this issue Mar 7, 2024 · 3 comments
Closed

--dbt is stuck with --json flag #874

harikaduyu opened this issue Mar 7, 2024 · 3 comments
Labels
bug Something isn't working triage

Comments

@harikaduyu
Copy link

Describe the bug
I'm trying to get a json output from a --dbt run which uses a state file. It works fine if there is no --json flag. But when I add the json flag, it gets stuck and process never finishes.

Make sure to include the following (minus sensitive information):

  • The command or code you used

sh data-diff --dbt --state prod-run-artifacts/manifest.json --json -d

  • The run output + error you're getting. (including tracestack)
Running with data-diff=0.11.1
15:44:30 INFO     Parsing file dbt_project.yml                                                                                                                                                                                                                                                                                                                                                                                               dbt_parser.py:287
         INFO     Parsing file /dbt_project/target/manifest.json                                                                                                                                                                                                                                                                                                                                                                             dbt_parser.py:280
         INFO     Parsing file prod-run-artifacts/manifest.json                                                                                                                                                                                                                                                                                                                                                                              dbt_parser.py:280
         INFO     Parsing file target/run_results.json                                                                                                                                                                                                                                                                                                                                                                                       dbt_parser.py:253
         INFO     config: prod_database=None prod_schema=None prod_custom_schema=None datasource_id=None                                                                                                                                                                                                                                                                                                                                     dbt_parser.py:159
         INFO     Parsing file /dbt_project/profiles.yml                                                                                                                                                                                                                                                                                                                                                                                     dbt_parser.py:294
         DEBUG    Found no PKs                                                                                                                                                                                                                                                                                                                                                                                                               dbt_parser.py:465
{"status": "failed", "model": "model.dbt.bi_dagster_asset", "dataset1": ["data-prod", "prod_observability", "bi_dagster_asset"], "dataset2": ["data-prod", "dbt_pr_test_ci_observability", "bi_dagster_asset"], "error": "No primary key found. Add uniqueness tests, meta, or tags.", "version": "1.0.0"}
         DEBUG    Found PKs via Uniqueness tests [fct_tbl_info]: {'col_id'}                                                                                                                                                                                                                                                                                                                                                                  dbt_parser.py:459
         DEBUG    Found PKs via Uniqueness tests [int_table]: {'col_id'}                                                                                                                                                                                                                                                                                                                                                                     dbt_parser.py:459
         DEBUG    Found no PKs                                                                                                                                                                                                                                                                                                                                                                                                               dbt_parser.py:465
{"status": "failed", "model": "model.dbt.dim_latest_email_table", "dataset1": ["data-prod", "prod_schema", "dim_latest_email_table"], "dataset2": ["data-prod", "dbt_pr_test_ci_schema", "dim_latest_email_table"], "error": "No primary key found. Add uniqueness tests, meta, or tags.", "version": "1.0.0"}
         DEBUG    Database 'BigQuery(default_schema='dev', _interactive=False, is_closed=False, _dialect=Dialect(_prevent_overflow_when_concat=False), project='data-dev', dataset='dev', _client=<google.cloud.bigquery.client.Client object at 0x10xxxx>)' does not allow setting timezone. We recommend making sure it's set to 'UTC'.                                                                                                     _connect.py:300
         DEBUG    Database 'BigQuery(default_schema='dev', _interactive=False, is_closed=False, _dialect=Dialect(_prevent_overflow_when_concat=False), project='data-dev', dataset='dev', _client=<google.cloud.bigquery.client.Client object at 0x12xxxx>)' does not allow setting timezone. We recommend making sure it's set to 'UTC'.                                                                                                     _connect.py:300
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                         base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'fct_tbl_info' AND table_schema = 'prod_schema'                                                                                                                                                                                                              
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                            base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'int_table' AND table_schema = 'prod_schema'                                                                                                                                                                                                       
15:44:32 DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                  base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`dbt_pr_test_ci_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'int_table' AND table_schema = 'dbt_pr_test_ci_schema'                                                                                                                                                                                   
         DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                               base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`dbt_pr_test_ci_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'fct_tbl_info' AND table_schema = 'dbt_pr_test_ci_schema'                                                                                                                                                                                          
15:44:33 DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                            base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'int_table' AND table_schema = 'prod_schema'                                                                                                                                                                                                       
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                         base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'fct_tbl_info' AND table_schema = 'prod_schema'                                                                                                                                                                                                              
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                             base.py:980
                  SELECT * FROM (SELECT TRIM(`sf_id`), TRIM(`col_name`), TRIM(`col_type`), TRIM(`col_mtd`), TRIM(`col_pl`) FROM `data-prod`.`prod_schema`.`int_table`) AS LIMITED_SELECT LIMIT 64                                                                                                                                                                                                                    
15:44:34 DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                          base.py:980
                  SELECT * FROM (SELECT TRIM(`unit`) FROM `data-prod`.`prod_schema`.`fct_tbl_info`) AS LIMITED_SELECT LIMIT 64                                                                                                                                                                                                                                                                                                                               
 ..... Cut because text gets too long ....                                                                                                                                     
         DEBUG    Done collecting stats for table #2: ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                       joindiff_tables.py:306
         DEBUG    Testing for null keys: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                       joindiff_tables.py:252
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                             base.py:980
                  SELECT `col_id` FROM `data-prod`.`prod_schema`.`int_table` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                              
         DEBUG    Done collecting stats for table #2: ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                   joindiff_tables.py:306
         DEBUG    Testing for null keys: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                joindiff_tables.py:252
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                         base.py:980
                  SELECT `col_id` FROM `data-prod`.`prod_schema`.`fct_tbl_info` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                                       
15:44:38 DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                   base.py:980
                  SELECT `col_id` FROM `data-prod`.`dbt_pr_test_ci_schema`.`int_table` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                    
         DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                               base.py:980
                  SELECT `col_id` FROM `data-prod`.`dbt_pr_test_ci_schema`.`fct_tbl_info` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                             
15:44:39 DEBUG    Counting exclusive rows: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                    joindiff_tables.py:372
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                     base.py:980
                  SELECT count(*) FROM (SELECT * FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS `is_exclusive_b`, CASE WHEN `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`col_value` is distinct from `tmp2`.`col_value` THEN 1 ELSE 0 END AS `is_diff_col_value`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0               
                  END AS `is_diff_org_id`, CASE WHEN `tmp1`.`col_combined_value` is distinct from `tmp2`.`col_combined_value` THEN 1 ELSE 0 END AS `is_diff_col_combined_value`, CASE WHEN `tmp1`.`col_mtd` is distinct from `tmp2`.`col_mtd` THEN 1 ELSE 0 END AS `is_diff_col_mtd`, CASE WHEN `tmp1`.`sf_id` is distinct from `tmp2`.`sf_id` THEN 1 ELSE 0 END AS `is_diff_sf_id`, CASE WHEN                                   
                  `tmp1`.`col_ch_prob` is distinct from `tmp2`.`col_ch_prob` THEN 1 ELSE 0 END AS `is_diff_col_ch_prob`, CASE WHEN `tmp1`.`col_name` is distinct from `tmp2`.`col_name` THEN 1 ELSE 0 END AS `is_diff_col_name`, CASE WHEN `tmp1`.`col_pl` is distinct from `tmp2`.`col_pl` THEN 1 ELSE 0 END AS `is_diff_col_pl`, CASE WHEN `tmp1`.`col_type` is distinct from `tmp2`.`col_type` THEN 1 ELSE 0 END AS                     
                  `is_diff_col_type`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, format('%.11f', `tmp1`.`col_value`) AS `col_value_a`, format('%.11f', `tmp2`.`col_value`) AS `col_value_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, format('%.11f', `tmp1`.`col_combined_value`) AS `col_combined_value_a`,             
                  format('%.11f', `tmp2`.`col_combined_value`) AS `col_combined_value_b`, cast(`tmp1`.`col_mtd` as string) AS `col_mtd_a`, cast(`tmp2`.`col_mtd` as string) AS `col_mtd_b`, cast(`tmp1`.`sf_id` as string) AS `sf_id_a`, cast(`tmp2`.`sf_id` as string) AS `sf_id_b`, format('%.11f', `tmp1`.`col_ch_prob`) AS `col_ch_prob_a`, format('%.11f', `tmp2`.`col_ch_prob`) AS             
                  `col_ch_prob_b`, cast(`tmp1`.`col_name` as string) AS `col_name_a`, cast(`tmp2`.`col_name` as string) AS `col_name_b`, cast(`tmp1`.`col_pl` as string) AS `col_pl_a`, cast(`tmp2`.`col_pl` as string) AS `col_pl_b`, cast(`tmp1`.`col_type` as string) AS `col_type_a`, cast(`tmp2`.`col_type` as string) AS `col_type_b` FROM                                                                                   
                  `data-prod`.`prod_schema`.`int_table` `tmp1` FULL OUTER JOIN `data-prod`.`dbt_pr_test_ci_schema`.`int_table` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_col_value` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_col_combined_value` = 1) OR (`is_diff_col_mtd` = 1) OR                
                  (`is_diff_sf_id` = 1) OR (`is_diff_col_ch_prob` = 1) OR (`is_diff_col_name` = 1) OR (`is_diff_col_pl` = 1) OR (`is_diff_col_type` = 1)) AND (`is_exclusive_a` OR `is_exclusive_b`)) tmp4                                                                                                                                                                                                                                                                       
         DEBUG    Counting exclusive rows: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                              joindiff_tables.py:372
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                               base.py:980
                  SELECT count(*) FROM (SELECT * FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS `is_exclusive_b`, CASE WHEN `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0 END AS `is_diff_org_id`, CASE WHEN `tmp1`.`is_error` is distinct from `tmp2`.`is_error` THEN 1 ELSE 0 END AS                     
                  `is_diff_is_error`, CASE WHEN `tmp1`.`is_inc_error` is distinct from `tmp2`.`is_inc_error` THEN 1 ELSE 0 END AS `is_diff_is_inc_error`, CASE WHEN `tmp1`.`is_non_inc_error` is distinct from `tmp2`.`is_non_inc_error` THEN 1 ELSE 0 END AS `is_diff_is_non_inc_error`, CASE WHEN `tmp1`.`created_at` is distinct from `tmp2`.`created_at` THEN 1 ELSE 0 END AS `is_diff_created_at`, CASE WHEN `tmp1`.`is_x` is distinct from                       
                  `tmp2`.`is_x` THEN 1 ELSE 0 END AS `is_diff_is_x`, CASE WHEN `tmp1`.`is_inc` is distinct from `tmp2`.`is_inc` THEN 1 ELSE 0 END AS `is_diff_is_inc`, CASE WHEN `tmp1`.`is_x_error` is distinct from `tmp2`.`is_x_error` THEN 1 ELSE 0 END AS `is_diff_is_x_error`, CASE WHEN `tmp1`.`unit` is distinct from `tmp2`.`unit` THEN 1 ELSE 0 END AS `is_diff_unit`, CASE WHEN                
                  `tmp1`.`is_missing_t` is distinct from `tmp2`.`is_missing_t` THEN 1 ELSE 0 END AS `is_diff_is_missing_t`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, cast(cast(`tmp1`.`is_error` as int) as string) AS `is_error_a`, cast(cast(`tmp2`.`is_error` as              
                  int) as string) AS `is_error_b`, cast(cast(`tmp1`.`is_inc_error` as int) as string) AS `is_inc_error_a`, cast(cast(`tmp2`.`is_inc_error` as int) as string) AS `is_inc_error_b`, cast(cast(`tmp1`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_a`, cast(cast(`tmp2`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_b`, FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp1`.`created_at`) AS `created_at_a`,                              
                  FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp2`.`created_at`) AS `created_at_b`, cast(cast(`tmp1`.`is_x` as int) as string) AS `is_x_a`, cast(cast(`tmp2`.`is_x` as int) as string) AS `is_x_b`, cast(cast(`tmp1`.`is_inc` as int) as string) AS `is_inc_a`, cast(cast(`tmp2`.`is_inc` as int) as string) AS `is_inc_b`, cast(cast(`tmp1`.`is_x_error` as int) as string) AS                                      
                  `is_x_error_a`, cast(cast(`tmp2`.`is_x_error` as int) as string) AS `is_x_error_b`, cast(`tmp1`.`unit` as string) AS `unit_a`, cast(`tmp2`.`unit` as string) AS `unit_b`, cast(cast(`tmp1`.`is_missing_t` as int) as string) AS `is_missing_t_a`, cast(cast(`tmp2`.`is_missing_t` as int) as string) AS `is_missing_t_b` FROM                                                    
                  `data-prod`.`prod_schema`.`fct_tbl_info` `tmp1` FULL OUTER JOIN `data-prod`.`dbt_pr_test_ci_schema`.`fct_tbl_info` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_is_error` = 1) OR (`is_diff_is_inc_error` = 1) OR (`is_diff_is_non_inc_error` = 1) OR (`is_diff_created_at` = 1)              
                  OR (`is_diff_is_x` = 1) OR (`is_diff_is_inc` = 1) OR (`is_diff_is_x_error` = 1) OR (`is_diff_unit` = 1) OR (`is_diff_is_missing_t` = 1)) AND (`is_exclusive_a` OR `is_exclusive_b`)) tmp4                                                                                                                                                                                                                                                      
15:44:40 DEBUG    Counting differences per column: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                           joindiff_tables.py:346
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                    base.py:980
                  SELECT sum(`is_diff_col_id`), sum(`is_diff_col_value`), sum(`is_diff_org_id`), sum(`is_diff_col_combined_value`), sum(`is_diff_col_mtd`), sum(`is_diff_sf_id`), sum(`is_diff_col_ch_prob`), sum(`is_diff_col_name`), sum(`is_diff_col_pl`), sum(`is_diff_col_type`) FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS `is_exclusive_b`, CASE WHEN                              
                  `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`col_value` is distinct from `tmp2`.`col_value` THEN 1 ELSE 0 END AS `is_diff_col_value`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0 END AS `is_diff_org_id`, CASE WHEN `tmp1`.`col_combined_value` is distinct from `tmp2`.`col_combined_value` THEN 1 ELSE 0 END AS                         
                  `is_diff_col_combined_value`, CASE WHEN `tmp1`.`col_mtd` is distinct from `tmp2`.`col_mtd` THEN 1 ELSE 0 END AS `is_diff_col_mtd`, CASE WHEN `tmp1`.`sf_id` is distinct from `tmp2`.`sf_id` THEN 1 ELSE 0 END AS `is_diff_sf_id`, CASE WHEN `tmp1`.`col_ch_prob` is distinct from `tmp2`.`col_ch_prob` THEN 1 ELSE 0 END AS `is_diff_col_ch_prob`, CASE WHEN `tmp1`.`col_name` is distinct             
                  from `tmp2`.`col_name` THEN 1 ELSE 0 END AS `is_diff_col_name`, CASE WHEN `tmp1`.`col_pl` is distinct from `tmp2`.`col_pl` THEN 1 ELSE 0 END AS `is_diff_col_pl`, CASE WHEN `tmp1`.`col_type` is distinct from `tmp2`.`col_type` THEN 1 ELSE 0 END AS `is_diff_col_type`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, format('%.11f', `tmp1`.`col_value`) AS            
                  `col_value_a`, format('%.11f', `tmp2`.`col_value`) AS `col_value_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, format('%.11f', `tmp1`.`col_combined_value`) AS `col_combined_value_a`, format('%.11f', `tmp2`.`col_combined_value`) AS `col_combined_value_b`, cast(`tmp1`.`col_mtd` as string) AS `col_mtd_a`,                                     
                  cast(`tmp2`.`col_mtd` as string) AS `col_mtd_b`, cast(`tmp1`.`sf_id` as string) AS `sf_id_a`, cast(`tmp2`.`sf_id` as string) AS `sf_id_b`, format('%.11f', `tmp1`.`col_ch_prob`) AS `col_ch_prob_a`, format('%.11f', `tmp2`.`col_ch_prob`) AS `col_ch_prob_b`, cast(`tmp1`.`col_name` as string) AS `col_name_a`, cast(`tmp2`.`col_name` as string) AS `col_name_b`,                        
                  cast(`tmp1`.`col_pl` as string) AS `col_pl_a`, cast(`tmp2`.`col_pl` as string) AS `col_pl_b`, cast(`tmp1`.`col_type` as string) AS `col_type_a`, cast(`tmp2`.`col_type` as string) AS `col_type_b` FROM `data-prod`.`prod_schema`.`int_table` `tmp1` FULL OUTER JOIN                                                                                                             
                  `data-prod`.`dbt_pr_test_ci_schema`.`int_table` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_col_value` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_col_combined_value` = 1) OR (`is_diff_col_mtd` = 1) OR (`is_diff_sf_id` = 1) OR (`is_diff_col_ch_prob` = 1) OR (`is_diff_col_name` = 1) OR                             
                  (`is_diff_col_pl` = 1) OR (`is_diff_col_type` = 1))                                                                                                                                                                                                                                                                                                                                                                                                                               
         DEBUG    Counting differences per column: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                   joindiff_tables.py:346
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                            base.py:980
                  SELECT sum(`is_diff_col_id`), sum(`is_diff_org_id`), sum(`is_diff_is_error`), sum(`is_diff_is_inc_error`), sum(`is_diff_is_non_inc_error`), sum(`is_diff_created_at`), sum(`is_diff_is_x`), sum(`is_diff_is_inc`), sum(`is_diff_is_x_error`), sum(`is_diff_unit`), sum(`is_diff_is_missing_t`) FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS                            
                  `is_exclusive_b`, CASE WHEN `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0 END AS `is_diff_org_id`, CASE WHEN `tmp1`.`is_error` is distinct from `tmp2`.`is_error` THEN 1 ELSE 0 END AS `is_diff_is_error`, CASE WHEN `tmp1`.`is_inc_error` is distinct from `tmp2`.`is_inc_error` THEN 1 ELSE 0 END AS                         
                  `is_diff_is_inc_error`, CASE WHEN `tmp1`.`is_non_inc_error` is distinct from `tmp2`.`is_non_inc_error` THEN 1 ELSE 0 END AS `is_diff_is_non_inc_error`, CASE WHEN `tmp1`.`created_at` is distinct from `tmp2`.`created_at` THEN 1 ELSE 0 END AS `is_diff_created_at`, CASE WHEN `tmp1`.`is_x` is distinct from `tmp2`.`is_x` THEN 1 ELSE 0 END AS `is_diff_is_x`, CASE WHEN `tmp1`.`is_inc` is distinct from                    
                  `tmp2`.`is_inc` THEN 1 ELSE 0 END AS `is_diff_is_inc`, CASE WHEN `tmp1`.`is_x_error` is distinct from `tmp2`.`is_x_error` THEN 1 ELSE 0 END AS `is_diff_is_x_error`, CASE WHEN `tmp1`.`unit` is distinct from `tmp2`.`unit` THEN 1 ELSE 0 END AS `is_diff_unit`, CASE WHEN `tmp1`.`is_missing_t` is distinct from `tmp2`.`is_missing_t` THEN 1 ELSE 0 END AS                                       
                  `is_diff_is_missing_t`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, cast(cast(`tmp1`.`is_error` as int) as string) AS `is_error_a`, cast(cast(`tmp2`.`is_error` as int) as string) AS `is_error_b`, cast(cast(`tmp1`.`is_inc_error` as int) as string) AS                        
                  `is_inc_error_a`, cast(cast(`tmp2`.`is_inc_error` as int) as string) AS `is_inc_error_b`, cast(cast(`tmp1`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_a`, cast(cast(`tmp2`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_b`, FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp1`.`created_at`) AS `created_at_a`, FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp2`.`created_at`) AS `created_at_b`,                                                
                  cast(cast(`tmp1`.`is_x` as int) as string) AS `is_x_a`, cast(cast(`tmp2`.`is_x` as int) as string) AS `is_x_b`, cast(cast(`tmp1`.`is_inc` as int) as string) AS `is_inc_a`, cast(cast(`tmp2`.`is_inc` as int) as string) AS `is_inc_b`, cast(cast(`tmp1`.`is_x_error` as int) as string) AS `is_x_error_a`, cast(cast(`tmp2`.`is_x_error` as int) as string) AS                  
                  `is_x_error_b`, cast(`tmp1`.`unit` as string) AS `unit_a`, cast(`tmp2`.`unit` as string) AS `unit_b`, cast(cast(`tmp1`.`is_missing_t` as int) as string) AS `is_missing_t_a`, cast(cast(`tmp2`.`is_missing_t` as int) as string) AS `is_missing_t_b` FROM `data-prod`.`prod_schema`.`fct_tbl_info` `tmp1` FULL OUTER JOIN                                     
                  `data-prod`.`dbt_pr_test_ci_schema`.`fct_tbl_info` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_is_error` = 1) OR (`is_diff_is_inc_error` = 1) OR (`is_diff_is_non_inc_error` = 1) OR (`is_diff_created_at` = 1) OR (`is_diff_is_x` = 1) OR (`is_diff_is_inc` = 1) OR (`is_diff_is_x_error` = 1)            
                  OR (`is_diff_unit` = 1) OR (`is_diff_is_missing_t` = 1))                                                                                                                                                                                                                                                                                                                                                                                                                      
15:44:41 INFO     Diffing complete: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                      joindiff_tables.py:165
{"status": "success", "result": "identical", "model": "model.dbt.int_table", "dataset1": ["data-prod", "prod_schema", "int_table"], "dataset2": ["data-prod", "dbt_pr_test_ci_schema", "int_table"], "rows": {"exclusive": {"dataset1": [], "dataset2": []}, "diff": []}, "summary": {"rows": {"total": {"dataset1": 27346, "dataset2": 27346}, 
"exclusive": {"dataset1": 0, "dataset2": 0}, "updated": 0, "unchanged": 27346}, "stats": {"diffCounts": {"col_value": 0, "org_id": 0, "col_combined_value": 0, "col_mtd": 0, "sf_id": 0, "col_ch_prob": 0, "col_name": 0, "col_pl": 0, "col_type": 0}}}, "columns": {"dataset1": [{"name": "col_id", "type": "INT64", "kind": "integer"}, {"name": "org_id", "type": "INT64", "kind": "integer"}, {"name": "sf_id", "type": 
"STRING", "kind": "unsupported"}, {"name": "col_name", "type": "STRING", "kind": "unsupported"}, {"name": "col_type", "type": "STRING", "kind": "unsupported"}, {"name": "col_value", "type": "FLOAT64", "kind": "float"}, {"name": "col_combined_value", "type": "FLOAT64", "kind": "float"}, {"name": "col_mtd", "type": "STRING", "kind": "unsupported"}, {"name": "col_pl", "type": "STRING", "kind": "unsupported"}, {"name": "col_ch_prob", "type": "FLOAT64", 
"kind": "float"}], "dataset2": [{"name": "col_id", "type": "INT64", "kind": "integer"}, {"name": "org_id", "type": "INT64", "kind": "integer"}, {"name": "sf_id", "type": "STRING", "kind": "unsupported"}, {"name": "col_name", "type": "STRING", "kind": "unsupported"}, {"name": "col_type", "type": "STRING", "kind": "unsupported"}, {"name": "col_value", "type": "FLOAT64", "kind": "float"}, {"name": "col_combined_value", "type": "FLOAT64", "kind": "float"}, 
{"name": "col_mtd", "type": "STRING", "kind": "unsupported"}, {"name": "col_pl", "type": "STRING", "kind": "unsupported"}, {"name": "col_ch_prob", "type": "FLOAT64", "kind": "float"}], "primaryKey": ["col_id"], "exclusive": {"dataset1": [], "dataset2": []}, "typeChanged": []}, "version": "1.1.0"}
15:45:22 INFO     Diffing complete: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                               joindiff_tables.py:165
⠼ 
In Progress fct_tbl_info
In Progress int_table
Diffing complete: ('data-prod', 'prod_schema', 'fct_tbl

The last line is also not fully shown. It's cut before even the table name.

Describe the environment

I'm using macOS 14.3.1

@harikaduyu harikaduyu added the bug Something isn't working label Mar 7, 2024
@github-actions github-actions bot added the triage label Mar 7, 2024
@Luttik
Copy link

Luttik commented Mar 11, 2024

I have exactly the same issue. Without --json its super fast, with --json it seems to be running for ever.

Copy link
Contributor

This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues/PRs that have gone stale label May 11, 2024
@glebmezh
Copy link
Contributor

Hi @harikaduyu,

I'm sorry for the delay in responding. Thank you for trying out data-diff and for opening this issue!

We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.

-Gleb

@github-actions github-actions bot removed the stale Issues/PRs that have gone stale label May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants