Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tap-marketo - static schema for leads if required by users to support… #90

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

guptaa3
Copy link

@guptaa3 guptaa3 commented Jan 9, 2024

… 1500 mb limit

Description of change

Adding availability to have a static schema for leads object if required by the user, our Marketo lead objects are too big and we are facing 1500 mb limitation while loading our data even for a day. To handle this we have added functionality to enable static schema for leads object allowing users to select and pull only the fields they need for analysis

Manual QA steps

Tested loads for both with and without leads schema file, both working fine

Risks

Rollback steps

  • revert this branch

@singer-bot
Copy link

Hi @guptaa3, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@singer-bot
Copy link

You did it @guptaa3!

Thank you for signing the Singer Contribution License Agreement.

@Vi6hal
Copy link
Member

Vi6hal commented Jan 9, 2024

hello @guptaa3 thank you for your contribution.

To handle this we have added functionality to enable static schema for leads object allowing users to select and pull only the fields they need for analysis

The singer tap supports field selection via the catalog file, have you tried selecting / deselecting fields in the catalog?
ref: https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md#metadata

This function only requests the fields that are selected or have automatic inclusion type.

def get_or_create_export_for_leads(client, state, stream, export_start, config):
    export_id = bookmarks.get_bookmark(state, "leads", "export_id")
    # check if export is still valid
    if export_id is not None and not client.export_available("leads", export_id):
        singer.log_info("Export %s no longer available.", export_id)
        export_id = None

    if export_id is None:
        # Corona mode is required to query by "updatedAt", otherwise a full
        # sync is required using "createdAt".
        query_field = "updatedAt" if client.use_corona else "createdAt"
        max_export_days = int(config.get('max_export_days',
                                         MAX_EXPORT_DAYS))
        export_end = get_export_end(export_start,
                                    end_days=max_export_days)
        query = {query_field: {"startAt": export_start.isoformat(),
                               "endAt": export_end.isoformat()}}

        # Create the new export and store the id and end date in state.
        # Does not start the export (must POST to the "enqueue" endpoint).
        fields = []
        for entry in stream['metadata']:
            if len(entry['breadcrumb']) > 0 and (entry['metadata'].get('selected') or entry['metadata'].get('inclusion') == 'automatic'):
                fields.append(entry['breadcrumb'][-1])

        export_id = client.create_export("leads", fields, query)
        state = update_state_with_export_info(
            state, stream, export_id=export_id, export_end=export_end.isoformat())
    else:
        export_end = pendulum.parse(bookmarks.get_bookmark(state, "leads", "export_end"))

    return export_id, export_end

ref: https://github.com/singer-io/tap-marketo/blob/master/tap_marketo/sync.py#L156-L187

@guptaa3
Copy link
Author

guptaa3 commented Jan 24, 2024

hello @guptaa3 thank you for your contribution.

To handle this we have added functionality to enable static schema for leads object allowing users to select and pull only the fields they need for analysis

The singer tap supports field selection via the catalog file, have you tried selecting / deselecting fields in the catalog? ref: https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md#metadata

This function only requests the fields that are selected or have automatic inclusion type.

def get_or_create_export_for_leads(client, state, stream, export_start, config):
    export_id = bookmarks.get_bookmark(state, "leads", "export_id")
    # check if export is still valid
    if export_id is not None and not client.export_available("leads", export_id):
        singer.log_info("Export %s no longer available.", export_id)
        export_id = None

    if export_id is None:
        # Corona mode is required to query by "updatedAt", otherwise a full
        # sync is required using "createdAt".
        query_field = "updatedAt" if client.use_corona else "createdAt"
        max_export_days = int(config.get('max_export_days',
                                         MAX_EXPORT_DAYS))
        export_end = get_export_end(export_start,
                                    end_days=max_export_days)
        query = {query_field: {"startAt": export_start.isoformat(),
                               "endAt": export_end.isoformat()}}

        # Create the new export and store the id and end date in state.
        # Does not start the export (must POST to the "enqueue" endpoint).
        fields = []
        for entry in stream['metadata']:
            if len(entry['breadcrumb']) > 0 and (entry['metadata'].get('selected') or entry['metadata'].get('inclusion') == 'automatic'):
                fields.append(entry['breadcrumb'][-1])

        export_id = client.create_export("leads", fields, query)
        state = update_state_with_export_info(
            state, stream, export_id=export_id, export_end=export_end.isoformat())
    else:
        export_end = pendulum.parse(bookmarks.get_bookmark(state, "leads", "export_end"))

    return export_id, export_end

ref: https://github.com/singer-io/tap-marketo/blob/master/tap_marketo/sync.py#L156-L187

Hi @Vi6hal, I am using Meltano to set up the configurations and run the tap - the issue here is that Meltano will do a discover and then run the job based on the catalog generated on fly - I did think of having a static catalog and passing it to Meltano as well but that seemed more cumbersome for users rather than having the catalog generated directly as Meltano functions by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants