Skip to main content
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
RenatoDM
New Contributor

Spark Data Lineage

I see the OpenLineage libraries are by default included as built-in library in Spark. When a notebook reads and writes to OneLake does it emit lineage events automatically? According to Copilot it does and lineage visualization in Purview is optional. Where are those events stored? I see a SparkLineage folder in OneLake but it is always empty. I am not able to find clear documentation regarding this topic. I appreciate comments. Thank you.

1 ACCEPTED SOLUTION
nilendraFabric
Honored Contributor

Hi @RenatoDM 

 

The `SparkLineage` folder in OneLake is not populated by default. Its presence suggests compatibility with OpenLineage standards, but explicit configuration is required.
โ€ข To emit granular OpenLineage events (e.g., column-level lineage), you must:
โ€ข Implement a SparkListener to intercept Spark execution plans.
โ€ข Configure diagnostic emitters to route logs to Azure Storage or Log Analytics

 

 

Native Purview integration captures basic item-level lineage (e.g., notebook โ†’ Lakehouse table) but doesnโ€™t populate `SparkLineage`

 

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/azure-synapse-diagnostic-emitters-az...

 

 

 

 

View solution in original post

1 REPLY 1
nilendraFabric
Honored Contributor

Hi @RenatoDM 

 

The `SparkLineage` folder in OneLake is not populated by default. Its presence suggests compatibility with OpenLineage standards, but explicit configuration is required.
โ€ข To emit granular OpenLineage events (e.g., column-level lineage), you must:
โ€ข Implement a SparkListener to intercept Spark execution plans.
โ€ข Configure diagnostic emitters to route logs to Azure Storage or Log Analytics

 

 

Native Purview integration captures basic item-level lineage (e.g., notebook โ†’ Lakehouse table) but doesnโ€™t populate `SparkLineage`

 

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/azure-synapse-diagnostic-emitters-az...

 

 

 

 

Helpful resources

Announcements
Users online (27)