-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/powerbi): Usage stats ingestion #10500
base: master
Are you sure you want to change the base?
feat(ingestion/powerbi): Usage stats ingestion #10500
Conversation
metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_classes.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/powerbi/config.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/data_resolver.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/powerbi/rest_api_wrapper/powerbi_api.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seem logical. Added a few comments to improve readability. One comment about change in URNs casing.
@root_validator(skip_on_failure=True) | ||
def validate_extract_usage_stats_for_interval(cls, values: Dict) -> Dict: | ||
if values["extract_usage_stats_for_interval"] > 30: | ||
raise ValueError("Usage stats for last 30 days only can be extracted.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("Usage stats for last 30 days only can be extracted.") | |
raise ValueError("Usage stats older than last 30 days can not be extracted.") |
@dataclass | ||
class Page: | ||
id: str | ||
displayName: str | ||
name: str | ||
order: int | ||
usageStats: Optional[Dict[str, UsageStat]] # date as key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usageStats: Optional[Dict[str, UsageStat]] # date as key | |
usageStats: Optional[Dict[datetime, UsageStat]] |
Can we use floored date time as key here to avoid need for guessing that key is a date.
@@ -207,6 +219,7 @@ class Report: | |||
embedUrl: str | |||
description: str | |||
dataset: Optional["PowerBIDataset"] | |||
usageStats: Optional[Dict[str, UsageStat]] # date as key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this. use datetime instead of str
@@ -244,6 +257,7 @@ class Dashboard: | |||
isReadOnly: Any | |||
workspace_id: str | |||
workspace_name: str | |||
usageStats: Optional[Dict[str, UsageStat]] # date as key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, use datetime instead
self, results: List[Dict], user_stats_key_as_guid: bool | ||
) -> Dict[str, Dict[str, Dict[str, UsageStat]]]: | ||
""" | ||
Return sub entity level usage metrics as Dict[<entity_id>, Dict[<sub_entity_id>, Dict[<date>, UsageStat]]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by sub entity level ? current entity and entities within this entity ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is applicable for only for reports and its pages.
Do you foresee that this method will be reused for any other entity/subentity combinations ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this methods would be more readable if we define dataclasses as
DateWiseUsage = Dict[datetime, UsageStat]
PowerBiEntityUsage:
overall_usage: DateWiseUsage
sub_entity_usage: Dict[str, DateWiseUsage] # key is subentity id
and return Dict[str, PowerBiEntityUsage]
as output of this method where key is entity id.
Also can we use setdefault
to reduce if..else
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be reused for parse_entity_level_usage_metrics_result
and all callers of this method.
usage_stats[entity_id][date].userUsageStats[user_id].viewsCount = ( | ||
usage_stats[entity_id][date].userUsageStats[user_id].viewsCount | ||
+ views_count | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have multiple rows for same (entity, user, date) combination ? Can we not push this down in Dax query so that it returns single value per (entity, user, date) ?
@@ -275,7 +275,7 @@ | |||
}, | |||
{ | |||
"entityType": "dashboard", | |||
"entityUrn": "urn:li:dashboard:(powerbi,dashboards.7D668CAD-7FFC-4505-9215-655BCA5BEBAE)", | |||
"entityUrn": "urn:li:dashboard:(powerbi,dashboards.7d668cad-7ffc-4505-9215-655bca5bebae)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change in case only due to change in test inputs or do we explicitly lowercase the urns/ids anywhere in codebase ?
.userUsageStats[user_id] | ||
.viewsCount | ||
+ views_count | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this represent ? Is it possible to have multiple rows for same (entity, subentity, user, date) combination ? Can we not push this down in Dax query so that it returns single value per (entity, subentity, user, date) ?
Description
This PR contains code to extract the PowerBI Reports and Dashboards Usage stats. Usage stats for report pages were unable to extract, as identifier for pages is in format <report_id>.<page_name>. Not page display name. Eg: '1902c899-3a0c-4b82-ab02-a4635178af59.ReportSection03fa40d30cea0f236e95’ and usage metrics metadata contain a proper page id.
Checklist