Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures in OpenlineageIn Function #192

Open
Kishor-Radhakrishnan opened this issue Mar 22, 2023 · 13 comments
Open

Failures in OpenlineageIn Function #192

Kishor-Radhakrishnan opened this issue Mar 22, 2023 · 13 comments
Labels
bug Something isn't working vnext Resolved in Next Release

Comments

@Kishor-Radhakrishnan
Copy link

We have implemented purview in many databricks workspaces. We are missing some lineage in UI. When troubleshooting issue , we can see the function app is failing multiple times due to event hub limit. We suspect this is causing lineage gaps.

Result: Error in OpenLineageIn function: The message (id:9471c2ab-59dc-41e6-9b44-38705ac613b3, size:23347619 bytes) is larger than is currently allowed (1048576 bytes). (eventhubmaestroadbpct6)
Exception: Azure.Messaging.EventHubs.EventHubsException(MessageSizeExceeded): The message (id:9471c2ab-59dc-41e6-9b44-38705ac613b3, size:23347619 bytes) is larger than is currently allowed (1048576 bytes).

Is there any option to overcome this eventhub limitation to avoid missing of lineage events.

@Kishor-Radhakrishnan Kishor-Radhakrishnan added the bug Something isn't working label Mar 22, 2023
@wjohnson
Copy link
Collaborator

@Kishor-Radhakrishnan I apologize for the delay in getting back to you.

You can enable a configuration setting that will remove the spark-plan if it exceeds a certain size.

https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/blob/release/2.3/docs/configuration.md#experimental-app-settings

Set maxQueryPlanSize to a value smaller than 1048576 - we need to take into account the rest of the OpenLineage payload as well so don't set it to exactly 1048576 but rather maybe something like 1000000 if you want to maximize how often you are receiving the spark plan in the properties of the databricks_notebook_task (just raw json from Spark showing the plan, there is no additional UI feature that uses this spark plan and none that is planned).

If you don't care to see the spark plan text / json in your properties, you could set maxQueryPlanSize even smaller to ensure you always get lineage events through even when you have a large number of inputs (that take up more bytes in the message going to event hub).

@Kishor-Radhakrishnan
Copy link
Author

@wjohnson I tried setting with much lower value . Still we are seeing failures.

Set value as 10000

@hmoazam
Copy link
Contributor

hmoazam commented Apr 8, 2023

Just for the sake of testing, can you make it a much smaller size @Kishor-Radhakrishnan? Try setting it to 50 and let us know the outcome.

@Kishor-Radhakrishnan
Copy link
Author

Tried the same . Still we are seeing many errors

@wjohnson
Copy link
Collaborator

@wjohnson I tried setting with much lower value . Still we are seeing failures.

Set value as 10000

Would you be able to share the latest logs after adding this setting? You should see something like this in the OpenLineageIn logs Query Plan size exceeded maximum. Removing query plan from OpenLineage Event

@Kishor-Radhakrishnan
Copy link
Author

Yes, am seeing that in logs . But still it looks many events are exceeding eventhub limits

@Kishor-Radhakrishnan
Copy link
Author

Screenshot 2023-04-17 at 7 12 28 PM

@Kishor-Radhakrishnan
Copy link
Author

query_data.csv.zip

Latest exception logs

@wjohnson
Copy link
Collaborator

@Kishor-Radhakrishnan thank you for your patience! These last logs helped us identify an error in the OpenLineageIn code that removed the spark plan in one variable but failed to remove the spark plan in another variable. That other variable was the one actually sending data to Event Hub!

I've put the changes in this branch https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/tree/hotfix/maxQueryPlanOLIn Would you be able to build this branch and deploy to your environment and confirm that maxQueryPlanSize is being respected for OpenLineageIn and PurviewOut?

Thank you again for all of your patience.

@Kishor-Radhakrishnan
Copy link
Author

Kishor-Radhakrishnan commented Apr 18, 2023

Screenshot 2023-04-18 at 4 06 32 PM

I have deployed latest fix. Lets monitor failures further.
Looks plan is getting omitted now . Check the latest logs screenshot after fix .

@Kishor-Radhakrishnan
Copy link
Author

Kishor-Radhakrishnan commented Apr 20, 2023

Unfortunately we still have many failures with same issue. But failure counts got reduced it looks

4/19/2023, 3:51:51.4891988 PM (Local time)

Result: Error in OpenLineageIn function: The message (id:232abc39-61f0-45c6-8644-f53d68c84ecd, size:36280210 bytes) is larger than is currently allowed (1048576 bytes). (eventhubmaestroadbpct6)
Exception: Azure.Messaging.EventHubs.EventHubsException(MessageSizeExceeded): The message (id:232abc39-61f0-45c6-8644-f53d68c84ecd, size:36280210 bytes) is larger than is currently allowed (1048576 bytes). (eventhubmaestroadbpct6)

@rabbyn
Copy link

rabbyn commented Oct 20, 2023

Hi @Kishor-Radhakrishnan , we are facing similar issue with spark jobs in my organization, did you manage to make it works? If yes how? Thanks
Cc : @wjohnson

@wjohnson wjohnson added the vnext Resolved in Next Release label Dec 30, 2023
@wjohnson
Copy link
Collaborator

This will be fixed in the next release where we will remove the spark plan and then column lineage information if the payload is still larger than the 1 MB payload limit. There will be future consideration for reducing mount points as in #219

It's still possible that there will be sections of the payload that result in too much information such as:

  1. Mount Points on the cluster
  2. Too many inputs
  3. Too many outputs

But only the mount points issue has been encountered so far. It still needs to be determined how to solve the mount point issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vnext Resolved in Next Release
Projects
None yet
Development

No branches or pull requests

4 participants