Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] sla policy events failing due to duplicate values #144

Closed
1 task done
mark-ogrady opened this issue Feb 8, 2024 · 27 comments · Fixed by #146 or #154
Closed
1 task done

[Bug] sla policy events failing due to duplicate values #144

mark-ogrady opened this issue Feb 8, 2024 · 27 comments · Fixed by #146 or #154
Labels
error:unforced status:scoping Currently being scoped type:bug Something is broken or incorrect

Comments

@mark-ogrady
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

I am getting a failed test on the sla policies table. Out of about 50,000 records in the final table, 100 are showing duplicate IDs.
I am not that familiar with the Zendesk data - is there a source table that could be causing these duplicate records?

Relevant error log or model output

14:17:31  Failure in test unique_zendesk__sla_policies_sla_event_id (models/zendesk.yml)
14:17:31
14:17:31    Got 272 results, configured to fail if != 0
14:17:31  
14:17:31
14:17:31    compiled Code at [target/compiled/zendesk/models/zendesk.yml/unique_zendesk__sla_policies_sla_event_id.sql](https://cloud.getdbt.com/api/v2/accounts/9087/runs/249801319/artifacts/compiled/zendesk/models/zendesk.yml/unique_zendesk__sla_policies_sla_event_id.sql)
14:17:31


### Expected behavior

Expecting model to fully build and pass all tests.

### dbt Project configurations

vars:
  # Zendesk vars set to False as we do not use them
  using_domain_names:         False         #Disable if you are not using domain names
  using_organization_tags:    False         #Disable if you are not using organization tags

### Package versions

package: fivetran/zendesk
version: [">=0.13.0", "<0.14.0"]

### What database are you using dbt with?

bigquery

### dbt Version

1.5.9

### Additional Context

_No response_

### Are you willing to open a PR to help address this issue?

- [ ] Yes.
- [X] Yes, but I will need assistance and will schedule time during our [office hours](https://calendly.com/fivetran-solutions-team/fivetran-solutions-team-office-hours) for guidance
- [ ] No.
@mark-ogrady mark-ogrady added the bug Something isn't working label Feb 8, 2024
@fivetran-joemarkiewicz
Copy link
Contributor

fivetran-joemarkiewicz commented Feb 8, 2024

Hi @mark-ogrady thanks for raising this issue. We actually have seen a few issues pop up recently around the SLA policies table and I believe we may have a workaround.

Would you be able to test the following WIP version of the package and let me know if you still see duplicates after using this version of the package:

packages:
  - git: https://github.com/fivetran/dbt_zendesk.git
    revision: bugfix/sla-investigations
    warn-unpinned: false

@mark-ogrady
Copy link
Author

mark-ogrady commented Feb 9, 2024

Thanks Joe,

Quick correction to the above:
git: https://github.com/fivetran/dbt_zendesk.git.

I managed to run this version, but the model failed in the dbt run stage, I got this Bigquery specific error:

17:07:11  Completed with 1 error and 0 warnings:
17:07:11  
17:07:11
17:07:11    Database Error in model int_zendesk__reply_time_business_hours (models/sla_policy/reply_time/int_zendesk__reply_time_business_hours.sql)
  Query exceeded resource limits. This query used 36943 CPU seconds but would charge only 40M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 36943 CPU seconds were used, and this query must use less than 10200 CPU seconds.
  compiled Code at target/run/zendesk/models/sla_policy/reply_time/int_zendesk__reply_time_business_hours.sql

Suggests there might be some issue with one of the joins in the SLA table perhaps?

@fivetran-avinash
Copy link
Contributor

Hey @mark-ogrady ! I think you might be right, but we'll need to do some further investigating to confirm this. We'll get back to you next week!

@fivetran-avinash
Copy link
Contributor

Hi @mark-ogrady, would you be able to test this branch again? We made some updates to the underlying logic within the branch that we believe will mitigate the computational query load you faced last time.

Let us know how it goes!

@mark-ogrady
Copy link
Author

Hi Avinash,

Thanks - I am no longer getting a resource error, but have gone back to the failing test:

12:07:42 32 of 32 PASS unique_stg_zendesk__user_user_id ................................. [PASS in 1.52s]
12:07:43 29 of 32 PASS unique_stg_zendesk__ticket_comment_ticket_comment_id ............. [PASS in 2.35s]
12:07:43  
12:07:43
12:07:43  Finished running 32 tests, 1 hook in 0 hours 0 minutes and 5.67 seconds (5.67s).
12:07:43  
12:07:43
12:07:43  Completed with 1 error and 0 warnings:
12:07:43  
12:07:43
12:07:43  Failure in test unique_zendesk__sla_policies_sla_event_id (models/zendesk.yml)
12:07:43
12:07:43    Got 272 results, configured to fail if != 0
12:07:43  
12:07:43
12:07:43    compiled Code at [target/compiled/zendesk/models/zendesk.yml/unique_zendesk__sla_policies_sla_event_id.sql](https://cloud.getdbt.com/api/v2/accounts/9087/runs/251561555/artifacts/compiled/zendesk/models/zendesk.yml/unique_zendesk__sla_policies_sla_event_id.sql)
12:07:43  
12:07:43
12:07:43  Done. PASS=31 WARN=0 ERROR=1 SKIP=0 TOTAL=32

Is there any information I can get for you that will help further identify the error?

@fivetran-joemarkiewicz fivetran-joemarkiewicz added type:bug Something is broken or incorrect status:scoping Currently being scoped and removed bug Something isn't working labels Feb 13, 2024
@fivetran-avinash
Copy link
Contributor

Hi @mark-ogrady, could you share a few examples of duplicate records that are failing? If you could grab them from the end model and any source data with these records, we can try and locally reproduce the error and figure out where the issue is coming from.

@nbdaniels
Copy link

Hi @fivetran-avinash! I'm actually running into this issue as well after making the switch to using_schedules: true and I wasn't having this issue last week and earlier this week. Anything I can do to fix the failing tests for duplicate sla_event_id?

The duplicate events have the exact same values except for sla_breach_at and sla_elapsed_time, if that helps.

@fivetran-avinash
Copy link
Contributor

fivetran-avinash commented Feb 17, 2024

Hi @mark-ogrady ! What are the metric values by chance for these duplicates? Is it requester_wait_time, agent_work_time, something else entirely, or multiple values? That might help us hone in on what exactly might be happening in the underlying models to cause these test failures.

@nbdaniels
Copy link

Hey @fivetran-avinash ! I hope I’m not taking away from @mark-ogrady's ticket but I can add some information from my instance.

The metric values that I’m seeing for duplicate sla_event_id are all first_reply_time. These were SLAs applied between March 6th - March 11th 2023. Please let me know if I can provide any other details to apply a fix!

@fivetran-avinash
Copy link
Contributor

fivetran-avinash commented Feb 20, 2024

Hi @nbdaniels , thanks for your contributions!

It is interesting that you are only seeing these duplicates in one week of the SLA. I am wondering if it has to do with daylight savings time, as the week the SLAs were applied were the week before DST started in 2023.

Before we dive deeper, we want to make sure there is no issue in your source data. Can you run this query on tickets where you are seeing duplicate SLA values in the source? We just want to confirm this before we investigate within the package.

select 
    ticket_id,
    field_name, 
    value,
    updated
from [your zendesk schema].ticket_field_history
where field_name = 'first_reply_time'
    and ticket_id = <ticket_id>
order by updated

@mark-ogrady, also feel free to run this query too for where you are seeing SLA policy test duplicates!

@nbdaniels
Copy link

nbdaniels commented Feb 23, 2024

Hi @fivetran-avinash! Sorry for the delay here, I totally missed this notification.

I've run the above query for all duplicate SLA policies:

with duplicate_sla_tickets as (
    select
        sla_event_id,
        ticket_id,
        count(*)
    from zendesk.zendesk__sla_policies
    group by 1,2
    having count(*) > 1
)
select 
    ticket_id,
    field_name, 
    value,
    updated
from zendesk.ticket_field_history
where field_name = 'first_reply_time'
    and ticket_id in (select distinct ticket_id from duplicate_sla_tickets)
order by ticket_id, updated

There are a handful of these tickets that have multiple rows for applying first_reply_time SLA but they are all distinct rows with no duplicate values in the source data. Many of these tickets just have a single row in the results.

Please let me know if I can run anything else on my end!

@fivetran-catfritz
Copy link
Contributor

Hi @nbdaniels thank you giving that a run! Would you also be able to share the results of this query:

select *
from <schema>.int_zendesk__reply_time_combined
where ticket_id = <ticket_id>

This way we can see the records in the intermediate model that are coming through and resulting in duplicates. From there we can work to make the proper adjustments!

@nbdaniels
Copy link

Hi @fivetran-catfritz! Sorry I completely missed this!

Just ran the query and will share some non-sensitive information. High-level: It looks like differing schedules start / end times may be causing this issue.

  • Duplicate rows for ticket_id - sla_policy_name - metric - sla_applied_at - agent_reply_at - next_solved_at - first_sla_breach_at - current_time_check -
  • The duplicated rows have differing values for sum_lapsed_business_minutes, sla_schedule_start_at, sla_schedule_end_at, sla_breach_at, next_schedule_start, sum_lapsed_business_minutes_new, updated_sla_policy_starts_at (only one row is populated, the other is null), sla_elapsed_time
  • is_stale_sla_policy and is_sla_breached may be different depending on the values in the other columns mentioned above

If you need example CSVs or any other details, please let me know and we can figure out the best way to send over some information.

@fivetran-catfritz
Copy link
Contributor

Hi @nbdaniels thank you for giving that a run! It does seem that your findings align with the daylight savings issue, but it would indeed be beneficial if you could provide CSV files containing the duplicate rows. We'd like to check out the differing schedules and sample tickets with the duplicates. You can email the CSVs to us at solutions@fivetran.com, referencing this ticket. Alternatively, you could initiate a data share with us, since we also use BigQuery, so let me know if you prefer to go that route!

@laurits-inpay
Copy link

We are facing the same issue, and it seems also related to daylight saving time as all duplicates occur around these dates. @fivetran-catfritz Let me know if you need any more examples around the issue.

@nbdaniels
Copy link

@fivetran-catfritz Apologies again for the delay! I just emailed a CSV over to solutions@fivetran.com but please let me know if I can provide anything else in the meantime!

@fivetran-catfritz
Copy link
Contributor

Thank you @nbdaniels! I confirm we received it and will take a look soon.

@laurits-inpay If you would also be willing to share an output sample of the below query, that would be helpful. You can send it to solutions@fivetran.com or let me know if you prefer to do a data share if you use Snowflake or Bigquery.

select *
from <schema>.int_zendesk__reply_time_combined
where ticket_id = <ticket_id>

@fivetran-reneeli
Copy link
Contributor

Hey everyone! We'll be taking this on in our current sprint. Thanks for the data shares which will help tremendously! We'll post in this thread with any questions/updates.

@fivetran-reneeli
Copy link
Contributor

Hi all, we made changes in this following branch and invite you to test it out. Please let us know if this tackles the issue of duplicate sla event ids!

packages:
  - git: https://github.com/fivetran/zendesk.git
    revision: bugfix/duplicate_sla_event_ids_from_dst
    warn-unpinned: false

@nbdaniels
Copy link

@fivetran-reneeli I apologize for the delay! I was OOO but I am testing this week. Thank you for making the changes and I'll let you know if we run into any issues!

@nbdaniels
Copy link

@fivetran-reneeli It looks like we're still hitting the error when pulling from the branch you mentioned. Please see below:

22:23:47  Installing https://xxxxxxxxxxxxxxxxxxxxxx@github.com/fivetran/dbt_zendesk.git/
22:23:48  Installed from revision bugfix/duplicate_sla_event_ids_from_dst

FAIL 14 unique_zendesk__sla_policies_sla_event_id ................... [FAIL 14 in 1.26s]

Please let me know if I can provide any other details or data examples!

@solmazb
Copy link

solmazb commented Mar 22, 2024

Hi all,
We are facing this same issue of the test breaking for 2 duplicates. I did try the proposed temp fix but I think the URL there gives 404, I did try with this setting:

- git: https://github.com/fivetran/dbt_zendesk.git
   revision: bugfix/duplicate_sla_event_ids_from_dst
   warn-unpinned:  false

and the same test still fails, somehow it creates duplicate surrogate keys, We really appreciate if the fix can go out sooner as this is causing all our dbt runs get marked as failed.

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @solmazb thanks for chiming in and sharing that you are also experiencing this issue. Unfortunately we do not have a fix available at the moment. The branch we shared above does not seem to accurately address the duplication issue.

We are struggling to find/apply a fix as we are unable to recreate the duplication issue which seems to be occurring during daylight saving schedule changes. @laurits-inpay and others in this thread have been helpful with sharing information around how the issue may be occurring. From this we have been able to identify that the duplication seems to be originating from the int_zendesk__reply_time_combined model; however, we are still unable to accurately recreate the issue and identify a fix at this moment.

If you would be available, it would be incredibly helpful to discuss live the duplicates you are seeing. This way we may possibly be able to understand the source of the duplication further. If you are interested in discussing live, feel free to schedule some time when we are both available. Thanks!

@nbdaniels
Copy link

Hi @fivetran-joemarkiewicz ! Sorry for the delay here -- I've been OOO and just getting back to this work. I'll set up some time at that link and we should be able to dig in live!

@fivetran-reneeli fivetran-reneeli linked a pull request Apr 17, 2024 that will close this issue
7 tasks
@fivetran-reneeli
Copy link
Contributor

Hi all, thanks for the patience and again we really appreciate you working with us to help address these duplicates!
We've just released v0.15.0 which addresses the duplicate SLAs resulting from the following root causes:

  • 2 versions of the same schedule due to daylight savings time
  • Overlapping schedule windows resulting from holidays that ran across more than 1 schedule week

We are aware that there are still a few duplicates remaining which we have whittled down to the following:

  • In the filtered_reply_times CTE in the int_zendesk__reply_time_combined model more than 1 condition is being met, therefore passing through multiple records for the same ticket_id/metric/sla_applied_at. We have a working solution but still need to go through testing. In the meantime, we wanted to get the rest of the updates out.

@TomaszE
Copy link

TomaszE commented May 15, 2024

Hi Guys,
I know I am coming in late in this thread, we installed the 0.15.0 version of dbt package after reporting different kind of issue - when and wanted to migrate my dbt run / dbt test to dbt build command for full dbt refresh - but unfortunately we can't as it's failing with same table

11:27:23 649 of 661 START test unique_zendesk__sla_policies_sla_event_id ................ [RUN]
11:27:24 649 of 661 FAIL 68 unique_zendesk__sla_policies_sla_event_id ................... [FAIL 68 in 1.05s]
11:30:21 Failure in test unique_zendesk__sla_policies_sla_event_id (models/zendesk.yml)
11:30:21   Got 68 results, configured to fail if != 0
11:30:21   compiled Code at target/compiled/zendesk/models/zendesk.yml/unique_zendesk__sla_policies_sla_event_id.sql

with details as follows:

11:27:23 Writing runtime sql for node "test.zendesk.unique_zendesk__sla_policies_sla_event_id.5daff4d2bd"
11:27:23 Using snowflake connection "test.zendesk.unique_zendesk__sla_policies_sla_event_id.5daff4d2bd"
11:27:23 On test.zendesk.unique_zendesk__sla_policies_sla_event_id.5daff4d2bd: /* {"app": "dbt", "dbt_version": "2024.5.151", "profile_name": "user", "target_name": "default", "node_id": "test.zendesk.unique_zendesk__sla_policies_sla_event_id.5daff4d2bd"} */
select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
      
    
    

select
    sla_event_id as unique_field,
    count(*) as n_records

from ANALYTICS.dbt_teitner_ce.zendesk__sla_policies
where sla_event_id is not null
group by sla_event_id
having count(*) > 1



      
    ) dbt_internal_test
11:27:23 Opening a new connection, currently in state closed
11:27:24 SQL status: SUCCESS 1 in 1.0 seconds
11:27:24 On test.zendesk.unique_zendesk__sla_policies_sla_event_id.5daff4d2bd: Close
11:27:24 649 of 661 FAIL 68 unique_zendesk__sla_policies_sla_event_id ................... [FAIL 68 in 1.05s]
11:27:24 Finished running node test.zendesk.unique_zendesk__sla_policies_sla_event_id.5daff4d2bd
11:30:21 Failure in test unique_zendesk__sla_policies_sla_event_id (models/zendesk.yml)
11:30:21   Got 68 results, configured to fail if != 0
11:30:21   compiled Code at target/compiled/zendesk/models/zendesk.yml/unique_zendesk__sla_policies_sla_event_id.sql

I've obvsly run the sql and I am getting 68 duplicated event ids all with count value 2.

Let me know if with my limited knowledge, but quite large Zendesk data source I can help - do something to help you testing. Cheers

@fivetran-reneeli
Copy link
Contributor

Hey @TomaszE! Thanks for the message-- We are actually releasing the remainder of the fixes today! In the meantime, you can also feel free to test with the branch that we will be merging.

# - git: https://github.com/fivetran/dbt_zendesk.git
# revision: bugfix/remianing-sla-issues
# warn-unpinned: false

@fivetran-reneeli fivetran-reneeli linked a pull request May 15, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error:unforced status:scoping Currently being scoped type:bug Something is broken or incorrect
Projects
None yet
9 participants