Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power BI Provider #39243

Closed
wants to merge 4 commits into from
Closed

Conversation

ambika-garg
Copy link
Contributor

@ambika-garg ambika-garg commented Apr 24, 2024

Apache Airflow Provider for Power BI.

Operators

PowerBIDatasetRefreshOperator

The operator triggers the Power BI dataset refresh and pushes the details of refresh in Xcom. It can accept the following parameters:

  • dataset_id: The dataset Id.
  • group_id: The workspace Id.
  • wait_for_termination: (Default value: True) Wait until the pre-existing or current triggered refresh completes before exiting.
  • force_refresh: When enabled, it will force refresh the dataset again, after pre-existing ongoing refresh request is terminated.
  • timeout: Time in seconds to wait for a dataset to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.
  • check_interval: Number of seconds to wait before rechecking the refresh status.

Hooks

PowerBI Hook

A hook to interact with Power BI.

  • powerbi_conn_id: Airflow Connection ID that contains the connection information for the Power BI account used for authentication.

Features

  • Xcom Integration: The Power BI Dataset refresh operator enriches the Xcom with essential fields for downstream tasks:

  1. powerbi_dataset_refresh_id: Request Id of the Dataset Refresh.
  2. powerbi_dataset_refresh_status: Refresh Status.
    • Unknown: Refresh state is unknown or a refresh is in progress.
    • Completed: Refresh successfully completed.
    • Failed: Refresh failed (details in powerbi_dataset_refresh_error).
    • Disabled: Refresh is disabled by a selective refresh.
  3. powerbi_dataset_refresh_end_time: The end date and time of the refresh (may be None if a refresh is in progress)
  4. powerbi_dataset_refresh_error: Failure error code in JSON format (None if no error)
  • External Monitoring link: The operator conveniently provides a redirect link to the Power BI UI for monitoring refreshes.

Sample DAG to use the plugin.

Check out the sample DAG code below:

from datetime import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator
from operators.powerbi_refresh_dataset_operator import PowerBIDatasetRefreshOperator


with DAG(
        dag_id='refresh_dataset_powerbi',
        schedule_interval=None,
        start_date=datetime(2023, 8, 7),
        catchup=False,
        concurrency=20,
        tags=['powerbi', 'dataset', 'refresh']
) as dag:

    refresh_in_given_workspace = PowerBIDatasetRefreshOperator(
        task_id="refresh_in_given_workspace",
        dataset_id="<dataset_id",
        group_id="workspace_id",
        force_refresh = False,
        wait_for_termination = False
    )

    refresh_in_given_workspace

* PowerBIDatasetRefreshOperator: Refreshes the Dataset
* PowerBI Hook: A class to interact with Power BI
* Unit tests
@potiuk
Copy link
Member

potiuk commented Apr 25, 2024

Just a kind reminder that proposal to add a new provider should be announced with justification - why you think the provider cannot be released and maintained outside of the community set of providers. Should be a thread at devlist and consensus reached by the community that we want it.

See the https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers for details

Example where you could see discussion and proposal about new providers (but you can search for others):

@Joffreybvn
Copy link
Contributor

A PowerBi / Microsoft Fabric provider would be really nice ! We (Infrabel) started to work on that, via @dabla's MsGraph Operators.

@ambika-garg I contacted you on Airflow's Slack. I'd like to discuss the further plans for this provider, and eventually how we can collaborate ?

@dabla
Copy link
Contributor

dabla commented Apr 29, 2024

A PowerBi / Microsoft Fabric provider would be really nice ! We (Infrabel) started to work on that, via @dabla's MsGraph Operators.

@ambika-garg I contacted you on Airflow's Slack. I'd like to discuss the further plans for this provider, and eventually how we can collaborate ?

This provider is a specialized operator for refreshing PowerBI datasets, but the MSGraphAsyncOperator (with the Trigger and Sensor) also allows you to achieve the same without a dedicated operator, but then you'll need to combine multiple ones. Nonetheless this could be a handy operator and nice addition as it combines the triggering and polling of the status of the dataset refresh in one handy operator. I agree with @Joffreybvn that this would be a nice opportunity to collaborate on this one and make sure this operator re-uses as much common code (for example the KiotaRequestAdapterHook could be shared in this case) as possible with the MSGraphAsyncOperator. The polling for example could then be done in an Aync way so that we don't block the Airflow workers until we get back a response from the PowerBI REST API.

@ambika-garg
Copy link
Contributor Author

After discussing with @Joffreybvn, I'm considering integrating this operator into Azure Data Factory using a generic connection type. Given that Power BI seems to be integrated into Microsoft Fabric, I plan to develop a Fabric Provider in the future, allowing us to consolidate these operators within Fabric. What do you think about this strategy?

@potiuk
Copy link
Member

potiuk commented May 19, 2024

Why not just adding it to microsoft.azure then? Adding new provider to manage, is quite an overhead (and to remind again - it would require devlist discussion). Adding just an operator to microsoft.azure seems to be preferred. Unless you think of course Fabric provider deserves to be separated for some reason.

Also I think starting devlist discussion is a good start to see how Microsoft is going to possibly support their providers via system tests similarly to Amazon and (soon) Google. Both Amazon and Google team developed and maintain and (Google soon) publish dashboards with system tests / example dags run in their systems that give the community (and AWS/Google teams) opportunity to see and fix any problems resulting with real services. That might be great opportunity for Microsoft to become more visible as stakeholder and follow the good practices of others.

@ambika-garg
Copy link
Contributor Author

ambika-garg commented May 20, 2024

Agreed, I'll submit this PR for the Power BI operator under microsoft.azure (initially, I mistakenly mentioned Azure Data Factory). So, let's go ahead and close this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants