Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to update object with sync mode incremental_deduped_history #70

Open
Santhin opened this issue Dec 13, 2023 · 14 comments
Open

Ability to update object with sync mode incremental_deduped_history #70

Santhin opened this issue Dec 13, 2023 · 14 comments

Comments

@Santhin
Copy link

Santhin commented Dec 13, 2023

When using sync mode incremental_deduped_history, connection updates are not possible. Recreating the connection is required to make changes.

Please do NOT include a primary key configuration for this stream.

Relates to:

@exactlyaaron
Copy link

exactlyaaron commented Jan 10, 2024

I think this may be related to one I opened as well: #31

@necsord
Copy link

necsord commented Jan 10, 2024

Hi @ThomasRooney, @terencecho, would you be able to help with this one? Provider is quite unusable in current state.

Looking at the provider implementation, features of speakeasyapi.dev it looks as if it wasn't possible to define conditional parameter for request to Airbyte API, meaning Airbyte API would need to be adjusted to accept or ignore primary key parameter instead of throwing error when source already defined it.

Anyone can confirm? Would that kind of change in Airbyte API be even possible?

@ThomasRooney
Copy link
Contributor

@necsord I can confirm your understanding. Currently Speakeasy's Provider generator assumes an API described in an OpenAPI spec is mostly idempotent. What this means in this case is that for all resource updates, we create an API request with the intersection of the full set of information that can be derived from resource state and is available in an UPDATE request (i.e. the operation with x-speakeasy-entity-operation: Connection#update in the OpenAPI spec). We do this because doing an UPDATE request lacking a property, usually is meaningful: often deleting that property from the remote resource.

I think what needs to happen here is either:

  1. The ConnectionPatchRequest in the OpenAPI spec is updated to no longer accept StreamConfiguration, but some form of StreamConfigurationPatch, with the primaryKey attribute dropped. If that happens, the provider generator would recognize that the lack of the primaryKey in the Connection Update request implies that a primaryKey is a CREATE-only attribute, and the connection would be recreated every time it changes (rather than attempted to patch a connection with one).
  2. Alternatively, the StreamConfigurationPatch could be delineated into multiple oneOf types that are distinct, to describe the different instantiations of stream configurations and avoid this problem whilst retaining behaviour: i.e. I don't know if the PK is needed when incremental_append is migrated to incremental_deduped_history in the ConnectionPatchRequest, or even if that's possible without recreating the connection.
  3. The API is updated to be idempotent.

If [1], or [2] is done with changes made to https://raw.githubusercontent.com/airbytehq/terraform-provider-airbyte/main/airbyte.yaml, make would regenerate the terraform provider with this behaviour adjusted.

I'm personally not sure what the best approach is. Speakeasy does its best to infer a terraform provider that matches the API as described by the OpenAPI spec: and where there is behaviour not really described by the OpenAPI specification (e.g. sensitive fields, hoisting, reconciliation of API requests/responses to a resource, runtime validations), to augment the OpenAPI spec with annotations that describe the desired behaviour. However we're not experts on the underlying API. Will defer to @terencecho to help guide us forward.

@necsord
Copy link

necsord commented Jan 11, 2024

Thanks @ThomasRooney for your input. Re-creating connection is not an option as it would re-trigger whole sync from scratch which takes hours to complete. I'm not sure if it's possible to implement option 2 as depending on source configuration there should be a different StreamConfigurationPatch which I recon is impossible to implement in yaml file.

Looking forward to @terencecho input 🙏🏻

We're also considering creating separate incremental resources configuration with dedicated StreamIncrementalConfigurationPatch-like object but that's a quick patch not a full fledged solution.

@ThomasRooney
Copy link
Contributor

I'm not sure if it's possible to implement option 2 as depending on source configuration there should be a different StreamConfigurationPatch which I recon is impossible to implement in yaml file

The canonical way of describing these is a discriminated oneOf with object children -- where each source configuration type would live in an independent JSON Schema, and the StreamConfiguration would be defined as oneOf [StreamIncrementalAppendConfiguration, StreamIncrementalDedupedHistory, ...] and StreamConfigurationPatch would be defined as oneOf [StreamIncrementalAppendConfigurationPatch, StreamIncrementalDedupedHistoryPatch, ...]. The only extra thing we'd need to make this work is a discriminator being defined on StreamConfiguration and StreamConfigurationPatch to allow the generator to reconcile StreamIncrementalAppendConfiguration with StreamIncrementalAppendConfigurationPatch.

It's a little more complicated than what exists now in the API, but I think that would work to describe the API better

@necsord
Copy link

necsord commented Feb 22, 2024

@ThomasRooney Thanks for your input, unfortunately I did not have time to check it up sooner.

Paths to the fields that will be used as primary key. This field is REQUIRED if destination_sync_mode is *_dedup unless it is already supplied by the source schema.

Source: PATCH /connections/{connectionId} API documentation

I'm not seeing a good discriminator field as I would need information about source, whether it supplies the primary_key. That would mean the only sane way is to update the API or re-do/update the terraform provider without SpeakEasy API.

Change with API would need to be raised separately within airbytehq/airbyte issues?

@nurikk-sa
Copy link

Same problem here, even with 0.4.1 provider version

@nurikk-sa
Copy link

For now can be workaround using lifecycle block

  lifecycle {
    # Workaround for https://github.com/airbytehq/terraform-provider-airbyte/issues/83
    # https://github.com/airbytehq/terraform-provider-airbyte/issues/70
    ignore_changes = [configurations.streams]
  }

@necsord
Copy link

necsord commented Feb 26, 2024

@nurikk-sa What would happen when you're modifying, adding new streams to existing connection?

@nurikk-sa
Copy link

@nurikk-sa What would happen when you're modifying, adding new streams to existing connection?

no drift will happen due to order change, at least it will give some time until this issue is fixed

@rshorser
Copy link

@nurikk-sa Brilliant workaround for the ordering changes. Adding/removing streams to an existing connection (issue #70) would still require a delete/recreate of the resource right?

@fabianboerner
Copy link

fabianboerner commented Feb 29, 2024

For now can be workaround using lifecycle block

  lifecycle {
    # Workaround for https://github.com/airbytehq/terraform-provider-airbyte/issues/83
    # https://github.com/airbytehq/terraform-provider-airbyte/issues/70
    ignore_changes = [configurations.streams]
  }

Did not work for me im still getting this error.

*edit

i just put primary to [] and it worked

@vincent-el
Copy link

vincent-el commented Mar 19, 2024

thank you @fabianboerner that was the only fix that worked for me!

to clarify, adjust the configurations like so, to make updates on the streams possible


resource "airbyte_connection" "hubspot_to_snowflake" {
  ...
  configurations = {
    streams = [
      {
        name = "contacts"
        sync_mode    = "incremental_deduped_history"
        primary_key = []
      },
      {
        name = "companies"
        sync_mode    = "incremental_deduped_history"
        primary_key = []
      },
    ]
  }
}

the downside is that every terraform plan will always show a change to the primary_key:
image

@fabianboerner
Copy link

fabianboerner commented Mar 19, 2024

thank you @fabianboerner that was the only fix that worked for me!

to clarify, adjust the configurations like so, to make updates on the streams possible


resource "airbyte_connection" "hubspot_to_snowflake" {
  ...
  configurations = {
    streams = [
      {
        name = "contacts"
        sync_mode    = "incremental_deduped_history"
        primary_key = []
      },
      {
        name = "companies"
        sync_mode    = "incremental_deduped_history"
        primary_key = []
      },
    ]
  }
}

Yes but if you want to setup a connection where you can’t define all streams because they maybe have variable collections in mongo db you will have the same issue again. The only solution I had here was to set the connection to be recreated always but gui user will lose their setup they maybe had done later.

It makes the whole thing sometimes tedious.

Also if you didn’t have had setup the streams via terraform it will not ignore configuration[“streams”] and the default value is always full overwrite for stream update type wich works within airbyte but not against the api that throws an error.

one thing I could life with was the changes terraform would show that are not applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants