Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lineage not getting displayed for all tables #203

Open
mithun1979 opened this issue May 24, 2023 · 1 comment
Open

Lineage not getting displayed for all tables #203

mithun1979 opened this issue May 24, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@mithun1979
Copy link

Lineage became visible for a table on the first run.
However, its no longer changing/updating after including additional notebooks tables.
The code does a simple CTAS;
CREATE TABLE <TABLE_NAME> USING DELTA AS
SELECT * from <SOURCE_TABLE_NAME>

The source Table is in ADLS Gen2. The Target table is a managed table in DBFS (Databricks Default Database).

Expected behavior
New Lineage information should show up in Purview
Logs
PurviewOut.log

OpenLineageIn.log

In PurviewOut.log, there is an error:
Information
2023-05-24 10:00:39.049
Error Loading to Purview JSON Entiitesto Purview: Return Code: BadRequest - Reason:Bad Request
Error
2023-05-24 10:00:39.049
Purview Publish Entity Metadata Error : Error :{"requestId":"fc68faa4-73c4-4808-a77b-2fe96f65546e","errorCode":"ATLAS-400-00-036","errorMessage":"invalid relationshipDef: process_dataset_outputs: end type 1: databricks_process, end type 2: databricks_notebook"}
Error
2023-05-24 10:00:40.128
Executed 'Functions.PurviewOut' (Succeeded, Id=0783df86-0011-480e-90c2-1c3660514b4d, Duration=4766ms)
Information

Screenshots
NA

Desktop (please complete the following information):

  • OS: Windows
  • OpenLineage Version: openlineage-spark-0.18.0.jar
  • Databricks Runtime Version: 11.3
  • Cluster Type: Interactive
  • Cluster Mode: No Isolation Shared
  • Using Credential Passthrough: No

Additional context
The Lineage data showed up the first time. So the setup seems to be good. It seems there is a ATLAS error in the PurviewOut.logs

@mithun1979 mithun1979 added the bug Something isn't working label May 24, 2023
@wjohnson
Copy link
Collaborator

@mithun1979 it looks like the input is okay but the output is pointing to /user/hive/warehouse/test_call_center_schema_chnaged and is not mapping correctly to hive metastore. It unfortunately is finding a databricks_notebook from the search results and trying to map the hive table to the first object that Purview search turns up as a match.

Support for Delta is limited but we will try to get better support for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants