feat(ingestion/powerbi): Usage stats ingestion #10500

shubhamjagtap639 · 2024-05-14T14:35:10Z

Description

This PR contains code to extract the PowerBI Reports and Dashboards Usage stats. Usage stats for report pages were unable to extract, as identifier for pages is in format <report_id>.<page_name>. Not page display name. Eg: '1902c899-3a0c-4b82-ab02-a4635178af59.ReportSection03fa40d30cea0f236e95’ and usage metrics metadata contain a proper page id.

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_classes.py

metadata-ingestion/src/datahub/ingestion/source/powerbi/config.py

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/powerbi_api.py

…into PowerBI-Usage-Ingestion

mayurinehate

Changes seem logical. Added a few comments to improve readability. One comment about change in URNs casing.

mayurinehate · 2024-05-29T08:04:51Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/config.py

+    @root_validator(skip_on_failure=True)
+    def validate_extract_usage_stats_for_interval(cls, values: Dict) -> Dict:
+        if values["extract_usage_stats_for_interval"] > 30:
+            raise ValueError("Usage stats for last 30 days only can be extracted.")


Suggested change

raise ValueError("Usage stats for last 30 days only can be extracted.")

raise ValueError("Usage stats older than last 30 days can not be extracted.")

mayurinehate · 2024-05-29T08:14:06Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_classes.py

 @dataclass
 class Page:
    id: str
    displayName: str
    name: str
    order: int
+    usageStats: Optional[Dict[str, UsageStat]]  # date as key


Suggested change

usageStats: Optional[Dict[str, UsageStat]] # date as key

usageStats: Optional[Dict[datetime, UsageStat]]

Can we use floored date time as key here to avoid need for guessing that key is a date.

mayurinehate · 2024-05-29T08:14:28Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_classes.py

@@ -207,6 +219,7 @@ class Report:
    embedUrl: str
    description: str
    dataset: Optional["PowerBIDataset"]
+    usageStats: Optional[Dict[str, UsageStat]]  # date as key


same for this. use datetime instead of str

mayurinehate · 2024-05-29T08:14:39Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_classes.py

@@ -244,6 +257,7 @@ class Dashboard:
    isReadOnly: Any
    workspace_id: str
    workspace_name: str
+    usageStats: Optional[Dict[str, UsageStat]]  # date as key


same, use datetime instead

mayurinehate · 2024-05-29T08:22:49Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py

+        self, results: List[Dict], user_stats_key_as_guid: bool
+    ) -> Dict[str, Dict[str, Dict[str, UsageStat]]]:
+        """
+        Return sub entity level usage metrics as Dict[<entity_id>, Dict[<sub_entity_id>, Dict[<date>, UsageStat]]].


What do you mean by sub entity level ? current entity and entities within this entity ?

So this is applicable for only for reports and its pages.
Do you foresee that this method will be reused for any other entity/subentity combinations ?

I think, this methods would be more readable if we define dataclasses as

DateWiseUsage = Dict[datetime, UsageStat] PowerBiEntityUsage: overall_usage: DateWiseUsage sub_entity_usage: Dict[str, DateWiseUsage] # key is subentity id

and return Dict[str, PowerBiEntityUsage] as output of this method where key is entity id.

Also can we use setdefault to reduce if..else here ?

This can be reused for parse_entity_level_usage_metrics_result and all callers of this method.

mayurinehate · 2024-05-29T09:40:28Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py

+                usage_stats[entity_id][date].userUsageStats[user_id].viewsCount = (
+                    usage_stats[entity_id][date].userUsageStats[user_id].viewsCount
+                    + views_count
+                )


Is it possible to have multiple rows for same (entity, user, date) combination ? Can we not push this down in Dax query so that it returns single value per (entity, user, date) ?

mayurinehate · 2024-05-29T09:50:55Z

metadata-ingestion/tests/integration/powerbi/golden_test_admin_access_not_allowed.json

@@ -275,7 +275,7 @@
 },
 {
    "entityType": "dashboard",
-    "entityUrn": "urn:li:dashboard:(powerbi,dashboards.7D668CAD-7FFC-4505-9215-655BCA5BEBAE)",
+    "entityUrn": "urn:li:dashboard:(powerbi,dashboards.7d668cad-7ffc-4505-9215-655bca5bebae)",


Is this change in case only due to change in test inputs or do we explicitly lowercase the urns/ids anywhere in codebase ?

mayurinehate · 2024-05-29T10:03:21Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py

+                    .userUsageStats[user_id]
+                    .viewsCount
+                    + views_count
+                )


What does this represent ? Is it possible to have multiple rows for same (entity, subentity, user, date) combination ? Can we not push this down in Dax query so that it returns single value per (entity, subentity, user, date) ?

shubhamjagtap639 added 2 commits May 6, 2024 11:18

powerbi poc code to get usage details of report and dashboard

a1123a4

Add PowerBI dashboard usage ingestion code

3638464

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 14, 2024

vercel bot deployed to Preview May 14, 2024 14:49 View deployment

hsheth2 reviewed May 15, 2024

View reviewed changes

shubhamjagtap639 added 3 commits May 20, 2024 19:46

Address review comments and add report usage code

fc0fed7

Fix lint error

f31cb2e

Merge branch 'master' of https://github.com/shubhamjagtap639/datahub …

8967980

…into PowerBI-Usage-Ingestion

vercel bot deployed to Preview May 20, 2024 14:44 View deployment

Add Report pages usage ingestion code

aace682

vercel bot deployed to Preview May 22, 2024 09:57 View deployment

mayurinehate suggested changes May 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingestion/powerbi): Usage stats ingestion #10500

feat(ingestion/powerbi): Usage stats ingestion #10500

shubhamjagtap639 commented May 14, 2024 •

edited

mayurinehate left a comment

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

mayurinehate May 29, 2024

	raise ValueError("Usage stats for last 30 days only can be extracted.")
	raise ValueError("Usage stats older than last 30 days can not be extracted.")

	usageStats: Optional[Dict[str, UsageStat]] # date as key
	usageStats: Optional[Dict[datetime, UsageStat]]

feat(ingestion/powerbi): Usage stats ingestion #10500

Are you sure you want to change the base?

feat(ingestion/powerbi): Usage stats ingestion #10500

Conversation

shubhamjagtap639 commented May 14, 2024 • edited

Description

Checklist

mayurinehate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubhamjagtap639 commented May 14, 2024 •

edited