-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/tableau): optionally ingest multiple sites and create site containers #10498
base: master
Are you sure you want to change the base?
feat(ingestion/tableau): optionally ingest multiple sites and create site containers #10498
Conversation
…e method instead of manually switching site
@@ -762,8 +809,9 @@ def _populate_projects_registry(self) -> None: | |||
|
|||
def _authenticate(self) -> None: | |||
try: | |||
self.server = self.config.make_tableau_client() | |||
logger.info("Authenticated to Tableau server") | |||
site_content_url = self.current_site.content_url if self.current_site else self.config.site |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done because I saw there is some code that does re-authentication:
self._authenticate() |
With this we always re-authenticate with the current site or fallback to the site property provided in the config (when authenticating the first time there's no
current_site
set).
|
||
# We rely on automatic browse paths (v2) when creating containers. That's why we need to sort the projects here. | ||
# Otherwise, nested projects will not have the correct browse paths if not created in correct order / hierarchy. | ||
self.tableau_project_registry = OrderedDict(sorted(projects_to_ingest.items(), key=lambda item: len(item[1].path))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to a bug we were experiencing when dealing with nested projects. If nested projects were added before their parent the browse path (v2) would be incorrect (wrong hierarchy). Let me know if there is a better way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works great, and is a good catch
if site.state != 'Active' or not self.config.site_name_pattern.allowed(site.name): | ||
continue | ||
self.current_site = site | ||
self.server.auth.switch_site(site) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed, that the documentation states that this method is not allowed with Tableau Cloud: https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref_authentication.htm#switch_site
Unfortunately, I'm not able to test it with Tableau Cloud. I had a solution in a previous commit that did the switch manually (by just re-authenticating). We could go back to this if needed. WDYT? Do we need to support Tableau Cloud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do need support for tableau cloud, so it would be better to manually switch
One other thing I need to test on my end - I want to double check that listing sites is supported with tableau cloud
I tried to fix the tests but for some reason when I run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some comments about code structuring, and we'll probably need a test for the ingest_multiple_sites
and add_site_container
functionality
add_site_container: bool = Field( | ||
False, | ||
description="When enabled, sites are added as containers and therefore visible in the folder structure within Datahub.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually it will make sense to make ingest_multiple_sites
and add_site_container
default to enabled, right
@@ -536,6 +560,7 @@ def __init__( | |||
self.tableau_project_registry: Dict[str, TableauProject] = {} | |||
self.workbook_project_map: Dict[str, str] = {} | |||
self.datasource_project_map: Dict[str, str] = {} | |||
self.current_site: Optional[SiteItem] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call _init_ingestion_variables
here
I'd prefer to have all variable initialization in a single spot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broadly, resetting initialization variables is usually a sign that the class isn't specific enough. I think it makes sense to have a TableauSiteSource
class that is site-specific, and the TableauSource
simply lists the sites and then calls TableauSiteSource.get_workunits_internal()
Most of the existing methods would move to the TableauSiteSource, so this would be a pretty small refactoring. The state management would also become a lot easier to follow and structurally guarantee that we don't forget to reset some variable, and would mean that we need fewer tests for this code
if ( | ||
site.state != "Active" | ||
or not self.config.site_name_pattern.allowed(site.name) | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to log these as skipped
if site.state != 'Active' or not self.config.site_name_pattern.allowed(site.name): | ||
continue | ||
self.current_site = site | ||
self.server.auth.switch_site(site) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do need support for tableau cloud, so it would be better to manually switch
One other thing I need to test on my end - I want to double check that listing sites is supported with tableau cloud
Description
With this PR, the Tableau ingestion can be configured to ingest multiple sites at once and to add the sites as containers. We have around 26 sites in the company, and it would be pretty cumbersome to manage 26 ingestions separately. Additionally, because we have so many sites and projects, we need the sites to show up as containers in Datahub.
To achieve this I introduced some new optional config properties:
ingest_multiple_sites
,add_site_container
andsite_name_pattern
. All existing recipes should continue to work as-is.A recipe could look something like this:
And that's how it looks like after ingesting:
I already saw this asked a few times in the Slack channel as well:
Please let me know what you think and I will proceed with adding some documentation and tests to this.
Many thanks.
Checklist