Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor track_deletes #1892

Merged
merged 13 commits into from Mar 25, 2024
Merged

Refactor track_deletes #1892

merged 13 commits into from Mar 25, 2024

Conversation

TBonnin
Copy link
Collaborator

@TBonnin TBonnin commented Mar 21, 2024

Describe your changes

Refactoring track_deletes logic so we don't need an extra snapshot tables.
The logic is now simple:

  • upsert adds/updates records (or set deletedAt), including updating the sync_job_id that serves as a generationId
  • when the sync is finish, if track_deletes == true then every records without a deletedAt that doesn't have the same sync_job_id as the current one is marked as deleted

Issue ticket number and link

https://linear.app/nango/issue/NAN-599/track-deletes-refactoring

Checklist before requesting a review (skip if just adding/editing APIs & templates)

  • I added tests, otherwise the reason is:
  • I added observability, otherwise the reason is:
  • I added analytics, otherwise the reason is:

Copy link

linear bot commented Mar 21, 2024

await configService.createProviderConfig({
unique_key: DEMO_GITHUB_CONFIG_KEY,
provider: 'github',
environment_id: env!.id,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to track_deletes: making environment id not optional, same from sync and syncJob below

@@ -1,14 +1,12 @@
export interface DataResponse {
id?: string;
[index: string]: unknown | undefined | string | number | boolean | Record<string, string | boolean | number | unknown>;
[index: string]: null | undefined | object | string | number | boolean | Record<string, string | boolean | number>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated but the linter was yelling at me

}

export interface UpsertSummary {
addedKeys: string[];
updatedKeys: string[];
deletedKeys?: string[];
affectedInternalIds: string[];
affectedExternalIds: string[];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used anywhere

@@ -401,25 +401,17 @@ export default class SyncRun {
async finishSync(models: string[], syncStartDate: Date, version: string, totalRunTime: number, trackDeletes?: boolean): Promise<void> {
let i = 0;
for (const model of models) {
let deletedKeys: string[] = [];
if (!this.isWebhook && trackDeletes) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khaliqgant I have kept the condition here but I am not sure why !this.isWebhook is necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a webhook and a sync with track deletes share the same config

@TBonnin TBonnin force-pushed the tbonnin/NAN-599/track-deletes-refactor branch from 37034c1 to 7fcf79b Compare March 21, 2024 20:16
Copy link
Contributor

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well, minor comments 👌🏻
I have one test that I think failed, but I don't know enough the system to be sure it's not what we want.
Run a sync three times:

    1. Add record id: 1 ✅
    1. Delete record id: 1 ✅
    1. Add back record id: 1 ❌
      The record column external_deleted_at is still filled with a date

I guess the next steps after a few weeks is to drop the table and pending_delete column?

packages/shared/lib/services/sync/job.service.ts Outdated Show resolved Hide resolved
packages/shared/lib/services/sync/data/records.service.ts Outdated Show resolved Hide resolved
packages/shared/lib/services/sync/data/records.service.ts Outdated Show resolved Hide resolved
})
.whereNot({
sync_job_id: generation
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the performance of this query. There is no index currently mapping those conditions, do you have a way to run an explain in prod with a large sync?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in Slack, it seems alright but If not, an index like this nango_connection_id, model, sync_id, job_id where external_is_deleted = false could work

@TBonnin
Copy link
Collaborator Author

TBonnin commented Mar 22, 2024

Works well, minor comments 👌🏻 I have one test that I think failed, but I don't know enough the system to be sure it's not what we want. Run a sync three times:

* 1. Add record id: 1 ✅

* 2. Delete record id: 1 ✅

* 3. Add back record id: 1 ❌
     The record column `external_deleted_at` is still filled with a date 

Good catch. will fix. Question though: should the creation date of a deleted records that is being brought back be reset or not. ie: should it be considered as added again or simply updated?

@TBonnin
Copy link
Collaborator Author

TBonnin commented Mar 22, 2024

I guess the next steps after a few weeks is to drop the table and pending_delete column?

The table will be dropped after the migration and the new table is not going to have a pending_delete column

@TBonnin TBonnin force-pushed the tbonnin/NAN-599/track-deletes-refactor branch 5 times, most recently from ac3033f to 643b245 Compare March 22, 2024 17:07
Copy link
Contributor

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, very nice improvement 🙏🏻

END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to have this logic in the code? I'm really not fond of sql logic, it's hard to debug and easy to forget

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not a huge fan of the fact it is easy to forget. The alternative would be to fetch all existing records and compare before merging

@TBonnin TBonnin force-pushed the tbonnin/NAN-599/track-deletes-refactor branch from 643b245 to 953152f Compare March 25, 2024 11:51
Copy link
Member

@khaliqgant khaliqgant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected from my tests, thanks!

@TBonnin TBonnin force-pushed the tbonnin/NAN-599/track-deletes-refactor branch from 953152f to f2e2ae7 Compare March 25, 2024 13:59
@TBonnin TBonnin enabled auto-merge (squash) March 25, 2024 14:00
@TBonnin TBonnin merged commit 4ddac7f into master Mar 25, 2024
18 checks passed
@TBonnin TBonnin deleted the tbonnin/NAN-599/track-deletes-refactor branch March 25, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants