Roadmap

Allow configuration of Google project where BQ queries are submitted

Is your feature request related to a problem? Please describe.

It’s currently not possible to ingest metadata for BigQuery datasets if the user doesn’t have permissions to submit jobs in the same project where the datasets live, and the same user has permissions to submit jobs in another project.

It seems that the project_id used to submit queries with SQLAlchemy is derived by the project_id for which we are ingesting metadata: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/bigquery.py#L337

Describe the solution you'd like

Allow the user to specify a “working_project_id” under which all queries will be submitted while ingesting datasets in the project specified under “project_id”

Additional context

It’s a common pattern in BQ to grant access to somebody to a given dataset but not allow jobs to be submitted in the project where the dataset lives.
The user will access that dataset by submitting a job in a project under his own control. This has mostly to do with the way billing works in GCP.

As an example it’s currently not possible to ingest metadata for the public datasets because of this limitation: https://cloud.google.com/bigquery/public-data

Thanks,

Gabriele