Skip to content
wjohnson edited this page Sep 10, 2021 · 6 revisions

Welcome to the pyapacheatlas wiki!

The purpose of this package is to make it easy to work with the Apache Atlas REST API without having to learn too much about its nuances. In addition, the package provides a way to read an Excel file and extract entities, lineage, column mappings, and type definitions so you don't have to dig into the nuances of Atlas just to get something into your data catalog.

The package is broken up into several submodules:

  • auth

    • Provides azure-identity (Managed Identity, Azure CLI), ServicePrincipal, and Basic authentication (for Apache Atlas) support.
  • core

    • Provides an AtlasClient or PurviewClient to your Apache Atlas backed service.
    • Provides AtlasEntity and AtlasProcess classes to make it easier to work with an Entity and Process type.
    • Provides Entity and Relationship TypeDef support.
    • Provides a "What If" validator to help check if your entities are valid against a provided set of type defs.
  • readers

    • A reader aides in extracting entities and types from standardized formats. Currently, the ExcelReader is the only provided reader. However, the Reader base class could be extended to support other formats you need.
    • A reader has a few standardized methods that take in a template that you have filled in and produces a batch of entities, custom lineage, column mapping, or type definitions.
    • The parse_update_lineage function reads an excel file's UpdateLineage tab and extracts your Process types from excel and prepares the metadata to be uploaded to Atlas or Purview.
    • The parse_bulk_entities function lets you define entities with attributes and their relationship to other entities (e.g. define a table, columns, and the connection between them).
    • The parse_entity_defs and parse_classification_defs extracts entity and classification definitions (respectively).
    • You can generate an Excel template with the required headers by running python -m pyapacheatlas --make-template ./template.xlsx on the terminal.
  • scaffolding

    • Create a type definition "payload" that provides the table, column, table lineage process, column lineage process, table to column relationship, and table lineage to column lineage relationship. from pyapacheatlas.scaffolding import column_lineage_scaffolding.

    Thank you for your interest in using PyApacheAtlas! Please be sure to take a look at the more detailed pages in the wiki to get more specific information on the Excel Reader and Azure Purview Tips.