Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental sync flaw #109

Open
gnilrets opened this issue Jun 20, 2023 · 2 comments · May be fixed by #110
Open

Incremental sync flaw #109

gnilrets opened this issue Jun 20, 2023 · 2 comments · May be fixed by #110

Comments

@gnilrets
Copy link

I'm running this tap through meltano. I've been running since March 2023 and have noticed that it occasionally misses records. If I do a full re-sync, the records show up, but something about the state is not working properly. Have you seen this before?

@gnilrets gnilrets changed the title Incremental sync not working as expected Incremental sync flaw Jun 27, 2023
@gnilrets
Copy link
Author

I'm pretty sure I discovered the issue here is due to a flaw with Jira pagination. When we run a query fetching all issues since the last update, we might get too many records for Jira to return at once, so it returns the first N and then indicates that there are M more records. However, it doesn't look like Jira has a way to cache the query results. So when you re-run the API call and request records starting with the N+1 record, it's possible that the results may have been updated.

For example, suppose I have 5 issues (ISSUE-1 through ISSUE-5) and we want to return all of them ordered by the updated timestamp, but we have maxResults: 3. For the first query, I submit a request with startAt: 0 and get

key updated
ISSUE-1 2022-06-27 00:00:00
ISSUE-2 2022-06-27 00:00:01
ISSUE-3 2022-06-27 00:00:02

The API will return that isLast: False and total: 5. I then submit a request with startAt: 3 and would expect to get the next two issues. However, right before this second request is made, ISSUE-2 is updated. This would shift ISSUE-4 into the first 3 records and ISSUE-2 would show up in the next set.

key updated
ISSUE-5 2022-06-27 00:00:04
ISSUE-2 2022-06-27 01:23:01

The consequence is that ISSUE-4 is never seen in the paginated results.

To fix this, I believe we'd have to stop using the Paginator and instead run subsequent queries with the maximum updated timestamp of the previous query.

@gnilrets
Copy link
Author

Using the maximum updated timestamp doesn't work either since JQL is limited to querying by the minute and it would be easy to get more than a page of data where all the records were updated in the same minute. Instead, we can query the data in descending order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant