-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lineage missing in purview #194
Comments
Hi @Kishor-Radhakrishnan, was this also fixed by the changes you made to fix #193, or do you still need us to look into it? |
@hmoazam Yes, As this error is different and we are seeing it frequent in DBR 10.4 LTS 23/04/05 08:27:02 ERROR EventEmitter: Could not emit lineage w/ exception java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at |
Hi Kishor, could you please share what data sources you're reading from and writing to when you're getting this error? |
@Kishor-Radhakrishnan would you please try this updated branch to help us collect more information on this issue? https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/tree/hotfix/maxQueryPlanOLIn If you could build and install this version and then run a notebook that is failing to send lineage even after our last fix. If you could please monitor the logs in the OpenLineageIn function, we should see:
If you could provide us that full open lineage payload and the error message, we can troubleshoot further and find a fix for this issue. |
@wjohnson attaching log having this error from monitor |
@Kishor-Radhakrishnan thank you so much for this! Would you be able to share any snippet of code on how you're reading the In this case, it appears to not recognize the partitions of the I think there are two approaches to solving this.
|
@wjohnson I am adding a snippet of code where write occur. `def write_dataframe(df,tgt_path,part_col): except Exception as exception: def full_load(tgt_tbl,src_data_path,cntrl_tbl,tgt_delta_path,table_type,delete_flag,part_col,opt_flg): ` |
also we face this issue for many different notebooks may be . we need to see why partition is not identified properly. May we can try short term solution of column lineage skip to see if it capture whole dataset level lineage. |
We are adding the column lineage removal as part of the next release. |
We have many cases where lineage is missing in purview. We will keep this issue and update logs to investigate.
23/04/05 08:27:02 ERROR EventEmitter: Could not emit lineage w/ exception java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:476) at sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:470) at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1369) at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73) at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:978) at io.openlineage.spark.shaded.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at io.openlineage.spark.shaded.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at io.openlineage.spark.shaded.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) at io.openlineage.spark.shaded.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at io.openlineage.spark.shaded.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at io.openlineage.spark.shaded.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.jav [log4j-active (7).txt](https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/files/11166751/log4j-active.7.txt)
log4j-active (8).txt
Attached two logs from databricks with errors. Please do investigate
The text was updated successfully, but these errors were encountered: