Skip to content

Prevent Breaking Changes. Due Diligence for Data Teams.(Atlassian Jira app)

Notifications You must be signed in to change notification settings

bcrant/diligence-doer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diligence Doer

Codegeist Hackathon 2021

See the Diligence Doer in action! Watch demo video on YouTube

img img img img


Overview

Diligence Doer is an Atlassian Forge app for Jira. It works by parsing the summary of a Jira Issue for database tables or columns, then displays the other resources where those database tables or fields are being used.

Currently, those resources can come from two places: Github and Tableau.

Github

  • Given a Github Repository and authentication token, Diligence Doer will return the name and link to the file(s) that contain the database table(s) in the summary of the Jira Issue.
  • In the app, these files are marked with the 📄 emoji.

Tableau

  • Given a Tableau Server and authentication token, Diligence Doer will return the name and link to the dashboard(s) whose datasources contain the database table(s) or field(s) in the summary of the Jira Issue.
  • In the app, these dashboards are marked with the 📈 emoji.

Usage

The information displayed by Diligence Doer can be seen directly in a Jira Issue underneath the description...

In Sprint

and in other places an Issue may exist, like the Backlog...

In Backlog

If the database table referenced in the ticket is not referenced in any other resources, Diligence Doer lets you know that, too!

No References


Getting Started

View SETUP.md documentation for an in depth walk through of the cloud deployment.

This project was built for the Atlassian Codegeist Hackathon 2021. If you would like to learn more about building apps with Atlassian Forge, here are some notes I took that will help you get started!

Atlassian Forge


Use Case Specific Caveats

Action may be required to customize this tool for your specific use case. In this section I will identify use cases which would require you to make code or configuration changes to this project, and point you towards the appropriate files in this repository to make those changes.

Using an Enterprise Github account

You will need to change the url endpoint to access the API for GitHub Enterprise Server. Edit the authenticate_github() function in the authentication.py file to point to your Enterprise Account. The change you need to make is in the function's docstring.

Identifying SQL commands in YML files

Not all data pipelines or orchestrators use YML and certainly not all of those that do use it in the same way. The functionality to look for SQL in YML files will be useful for others that use AWS Datapipeline or Dataduct, but may be noisy feature for those that do not.

If you DO NOT have YML files that contain SQL in your repo... You can disable this feature by changing the following line in the get_files_containing_sql() function of the get_repository.py file.

  • Change from: if len(split_name) >= 2 and split_name[1] in ['sql', 'yml']:
  • Change to: if len(split_name) >= 2 and split_name[1] == 'sql':

If you DO have YML files that contain SQL in your repo...

  • The YML parser is hardcoded to read the keys used by Dataduct. It parses all keys named steps that are of step_type: sql_command
  • If you want to parse your YML files but do not use Dataduct, you may need to adjust the YML keys and their properties to match your YML structure in the parse_yml.py file.

Handling environment variables in YML files

The use of environment variables in YML files may introduce breaking characters (%, {{, }}) to the PyYaml parser. Instead of letting these fail silently, which would result in none of the YML file being parsed, we have provided two solutions for this case.

If you DO NOT have YML files that contain environment variables pertinent to this project, such as schema, table, or field names, you can opt to exclude this altogether.

  • To prevent the parser from finding and replacing environment variables, simply pass an empty string '' as the env_var_file_path in the parse_yml() function found in the parse_yml.py file.
  • Alternatively, in the read_bytestream_to_yml() function in the same file, you could comment out the line replace_yml_env_vars(linted_yml_file, replace_dict) and change the input of the yaml.safe_load() function to stream=linted_yml_file

If you DO have YML files that contain environment variables pertinent to this project, such as schema, table, or field names, you can find and replace environment variables with their keys.

  • To find and replace environment variables with their keys, provide the path to the specific YML file in repository that contains the environment variables as key value pairs in the function mentioned above.

Project To Do's:

  • Add support for parsing Spark SQL and DataFrames
  • Add support for other code repository hosts (BitBucket, Gitlab) and BI Tools (Looker, Power BI)
  • Add support for matching individual fields
    • Ingest and parse fields from Github and Tableau
    • Determine best way to shape the data for this use case
    • Determine best way to identify a field name separate from database table in a Jira Issue Summary
      • For instance, consider the issue summary: "Combine address1 and address2 fields in ods.customers"
      • If we add databases as a source (Snowflake, Redshift, BigQuery), we could then check each word in the summary against the actual schema for the database table:
      "ods.customers.Combine"
      "ods.customers.address1"
      "ods.customers.and"
      "ods.customers.address2"
      "ods.customers.fields"
      "ods.customers.in"
      

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT license

Authors



Buy us coffee