Skip to content

Latest commit

 

History

History
126 lines (81 loc) · 6.86 KB

File metadata and controls

126 lines (81 loc) · 6.86 KB

Deployment of the Purview ADB Lineage Solution Accelerator Demo

No additional prerequisites are necessary as the demo environment will be setup for you, including Azure Databricks, Purview, ADLS, and example data sources and notebooks.

Services Installed

high-level-architecture.svg

Deployment Steps

Choose your subscription and create resource group

From the Azure Portal

  1. At the top of the page, click the Cloud Shell icon

    ShellIcon.png

  2. Make sure “Bash” is selected from the dropdown menu located at the left corner of the terminal.

    SelectBash.png

    a. Click “Confirm” if the “Switch to Bash in Cloud Shell” pop up appears.

    BashCloudShellConfirm.png

  3. Use az account set --subscription "<SubscriptionID>" to select the azure subscription you want to use.

    Note: If your Cloud Shell disconnects, you will need to rerun this command again to ensure the correct subscription.

  4. Create a resource group for the demo deployment by using

    az group create --location <ResourceGroupLocation> --resource-group <ResourceGroupName>

    Note: Save the name of this resource group for use later

Clone the repository into Azure cloud shell

  1. Change directory to the cloud storage directory (clouddrive)

    cd clouddrive
  2. Clone this repository into the clouddrive directory using the latest release tag (i.e. 2.x.x)

    git clone -b <release_tag> https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator.git

    Note:
    We highly recommend cloning from the release tags listed here.

    Clone the main branch only when using nightly builds. By using a nightly build (i.e. the latest commit on main), you gain access to newer / experimental features, however those features may change before the next official release. If you are testing a deployment for production, please clone using release tags.

Configure application settings file

  1. After the clone, click the "Upload/download" icon and select “manage file share” UploadFilesBash.png

  2. Navigate to Purview-ADB-Lineage-Solution-Accelerator/deployment/infra/settings.sh click “…” and select "edit" EditSettings.png

  3. Input values for:

    • Resource group
    • Prefix (this is added to service names)
    • Client ID & Secret (from the App ID required as a prerequisite)
    • Tenant id
    • Purview location
    • Resource Tags (optional, in the following format: {"Name":"Value","Name2":"Value2"})
      • NOTE: Resource Tags are optional. If you are not using any Resource Tags, input empty set of double quotes("").
  4. Push the Save icon to save your changes

Run the installation script

Note: Running this script will create all the services noted above, including Azure Databricks and an Azure Databricks cluster which will start after deployment. This cluster is configured to auto terminate after 15 minutes but some Azure charges will accrue.

  1. Navigate to cd clouddrive/Purview-ADB-Lineage-Solution-Accelerator/deployment/infra

    Note:
    If your organization requires private endpoints for Azure Storage and Azure Event Hubs, you may need to follow the private endpoint guidance and modify the provided arm template.

  2. Run ./openlineage-deployment.sh

  3. (Manual Configuration) After the initial deployment the script will stop and will ask you to add the service principal to the data curator role in the Purview resource. Follow this documentation to Set up Authentication using Service Principal using the Application Identity you created as a prerequisite to installation.

  4. Once your service principal is added, go back to the Bash terminal and hit "Enter"

  5. The Purview types will be deployed and the deployment will finish

Note:
At this point, you should confirm resources deployed successfully. In particular, check the Azure Function and inside its Functions tab, you should see an OpenLineageIn and PurviewOut function. If you have an error like Microsoft.Azure.WebJobs.Extensions.FunctionMetadataLoader: The file 'C:\home\site\wwwroot\worker.config.json' was not found. please restart or start and stop the function to resolve the issue. Lastly check the Azure Function Configuration tab and check if all the Key Vault Referenced app settings have a green checkmark. If not, wait an additional 2-5 minutes and refresh the screen. If Key Vault references are not all green, check that the Key Vault has an access policy referencing the Azure Function.

  1. Finally, run the Databricks notebook provided in your new workspace and observe lineage in Microsoft Purview once the Databricks notebook has finished running all cells.

  2. If you do not see any lineage please follow the steps in the troubleshooting guide.

  3. If you are interested in demonstrating lineage from Databricks jobs, please follow the steps in the connector only deployment.

Post Installation

Note: If your original bash shell gets closed or goes away while you are completing the manual installation steps above, you can manually run the final part of the installation by running the following from a cloud bash shell in the same subscription context:

purview_endpoint="https://<enter_purview_account_name>.purview.azure.com"
TENANT_ID="<TENANT_ID>" 
CLIENT_ID="<CLIENT_ID>" 
CLIENT_SECRET="<CLIENT_SECRET>"

acc_purview_token=$(curl https://login.microsoftonline.com/$TENANT_ID/oauth2/token --data "resource=https://purview.azure.net&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&grant_type=client_credentials" -H Metadata:true -s | jq -r '.access_token')
purview_type_resp_custom_type=$(curl -s -X POST $purview_endpoint/catalog/api/atlas/v2/types/typedefs \
        -H "Authorization: Bearer $acc_purview_token" \
        -H "Content-Type: application/json" \
        -d @Custom_Types.json )

echo $purview_type_resp_custom_type

If you need a Powershell alternative, see the docs.

You should now be able to run your demo notebook and receive lineage.