Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/powerbi): Usage stats ingestion #10500

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

shubhamjagtap639
Copy link
Contributor

@shubhamjagtap639 shubhamjagtap639 commented May 14, 2024

Description

This PR contains code to extract the PowerBI Reports and Dashboards Usage stats. Usage stats for report pages were unable to extract, as identifier for pages is in format <report_id>.<page_name>. Not page display name. Eg: '1902c899-3a0c-4b82-ab02-a4635178af59.ReportSection03fa40d30cea0f236e95’ and usage metrics metadata contain a proper page id.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 14, 2024
Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem logical. Added a few comments to improve readability. One comment about change in URNs casing.

@root_validator(skip_on_failure=True)
def validate_extract_usage_stats_for_interval(cls, values: Dict) -> Dict:
if values["extract_usage_stats_for_interval"] > 30:
raise ValueError("Usage stats for last 30 days only can be extracted.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Usage stats for last 30 days only can be extracted.")
raise ValueError("Usage stats older than last 30 days can not be extracted.")

@dataclass
class Page:
id: str
displayName: str
name: str
order: int
usageStats: Optional[Dict[str, UsageStat]] # date as key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
usageStats: Optional[Dict[str, UsageStat]] # date as key
usageStats: Optional[Dict[datetime, UsageStat]]

Can we use floored date time as key here to avoid need for guessing that key is a date.

@@ -207,6 +219,7 @@ class Report:
embedUrl: str
description: str
dataset: Optional["PowerBIDataset"]
usageStats: Optional[Dict[str, UsageStat]] # date as key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this. use datetime instead of str

@@ -244,6 +257,7 @@ class Dashboard:
isReadOnly: Any
workspace_id: str
workspace_name: str
usageStats: Optional[Dict[str, UsageStat]] # date as key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, use datetime instead

self, results: List[Dict], user_stats_key_as_guid: bool
) -> Dict[str, Dict[str, Dict[str, UsageStat]]]:
"""
Return sub entity level usage metrics as Dict[<entity_id>, Dict[<sub_entity_id>, Dict[<date>, UsageStat]]].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by sub entity level ? current entity and entities within this entity ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is applicable for only for reports and its pages.
Do you foresee that this method will be reused for any other entity/subentity combinations ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this methods would be more readable if we define dataclasses as

DateWiseUsage = Dict[datetime, UsageStat]


PowerBiEntityUsage:
    overall_usage: DateWiseUsage
    sub_entity_usage: Dict[str, DateWiseUsage] # key is subentity id

and return Dict[str, PowerBiEntityUsage] as output of this method where key is entity id.

Also can we use setdefault to reduce if..else here ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be reused for parse_entity_level_usage_metrics_result and all callers of this method.

usage_stats[entity_id][date].userUsageStats[user_id].viewsCount = (
usage_stats[entity_id][date].userUsageStats[user_id].viewsCount
+ views_count
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have multiple rows for same (entity, user, date) combination ? Can we not push this down in Dax query so that it returns single value per (entity, user, date) ?

@@ -275,7 +275,7 @@
},
{
"entityType": "dashboard",
"entityUrn": "urn:li:dashboard:(powerbi,dashboards.7D668CAD-7FFC-4505-9215-655BCA5BEBAE)",
"entityUrn": "urn:li:dashboard:(powerbi,dashboards.7d668cad-7ffc-4505-9215-655bca5bebae)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change in case only due to change in test inputs or do we explicitly lowercase the urns/ids anywhere in codebase ?

.userUsageStats[user_id]
.viewsCount
+ views_count
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this represent ? Is it possible to have multiple rows for same (entity, subentity, user, date) combination ? Can we not push this down in Dax query so that it returns single value per (entity, subentity, user, date) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants