Roadmap

Spark column level lineage

Is your feature request related to a problem? Please describe.

There is no support for Spark column level lineage.

Describe the solution you'd like

It would be great if the schema and column level lineage could be automatically extracted from Spark logical plan like in OpenLineage implementation.

Describe alternatives you've considered

I guess the information could be also manually ingested with this example: https://datahubproject.io/docs/generated/metamodel/entities/dataset/#fine-grained-lineage

Another option is to use OpenLineage/Marquez as source with custom integration.

Additional context

Related feature request:

https://feature-requests.datahubproject.io/p/not-able-to-see-schema-of-data-ingested-using-spark-lineage