YARN-dev-tools

This project contains various developer helper scripts in order to simplify every day tasks related to Apache Hadoop YARN development.

Main dependencies

gitpython - GitPython is a python library used to interact with git repositories, high-level like git-porcelain, or low-level like git-plumbing.
tabulate - python-tabulate: Pretty-print tabular data in Python, a library and a command-line utility.
bs4 - Beautiful Soup is a Python library for pulling data out of HTML and XML files.
TODO: Missing dependencies

Contributing

TODO

Authors

Szilard Nemeth - Initial work - Szilard Nemeth

License

TODO

Acknowledgments

TODO

Getting started

In order to use this tool, you need to have at least Python 3.8 installed.

Use yarn-dev-tools from package (Recommended)

If you don't want to tinker with the source code, you can download yarn-dev-tools from PyPi as well. This is probably the easiest way to use it. You don't need to install anything manually as I created a script that performs the installation automatically. The script has a setup-vars function at the beginning that defines some environment variables:

These are the following:

YARNDEVTOOLS_ROOT: Specifies the directory where the Python virtualenv will be created and yarn-dev-tools will be installed to this virtualenv.
HADOOP_DEV_DIR Should be set to the upstream Hadoop repository root, e.g.: "~/development/apache/hadoop/"
CLOUDERA_HADOOP_ROOT Should be set to the downstream Hadoop repository root, e.g.: "~/development/cloudera/hadoop/"

The latter two environment variables is better to be added to your bashrc / zshrc file (depending on what shell you are using) to keep them between the shells.

Use yarn-dev-tools from source

If you want to use yarn-dev-tools from source, first you need to install its dependencies. The project root contains a pyproject.toml file that has all the dependencies listed. The project uses Poetry to resolve the dependencies so you need to install poetry as well. Simply go to the root of this project and execute poetry install --without localdev. Alternatively, you can run make from the root of the project.

Setting up handy aliases to use yarn-dev-tools

If you completed the installation (either by source or by package), you may want to define some shell aliases to use the tool more easily. In my system, I have these. Please make sure to source this script so that the command 'yarndevtools' will be available since it's defined as a function. It is important to specify HADOOP_DEV_DIR and CLOUDERA_HADOOP_ROOT as mentioned above, before sourcing the script.

After these steps, you will have a basic set of aliases that is enough to get you started.

Setting up yarn-dev-tools with Cloudera CDSW

Initial setup

Upload the initial setup scripts to the CDSW files, to the root directory (/home/cdsw)

Create a new CDSW session. Wait for the session to be launched and open up a terminal by Clicking "Terminal access" on the top menu bar.
Execute this command:

~/initial-cdsw-setup.sh user cloudera

The script performs the following actions:

Downloads the scripts that are cloning the upstream and downstream Hadoop repositories + installing yarndevtools itself as a python module. The download location is: /home/cdsw/scripts
Please note that the files will be downloaded from the GitHub master branch of this repository!

Executes the script described in step 2. This can take some time, especially cloning Hadoop. Note: The individual CDSW jobs should make sure for themselves to clone the repositories.
Copies the python-based job configs for all jobs to /home/cdsw/jobs
All you have to do in CDSW is to set up the projects and their starter scripts like this:

Project	Starter script location	Arguments for script
Jira umbrella data fetcher (Formerly: Jira umbrella checker reporting)	scripts/start_job.py	jira-umbrella-data-fetcher
Unit test result aggregator	scripts/start_job.py	unit-test-result-aggregator
Unit test result fetcher (Formerly: Unit test result reporting)	scripts/start_job.py	unit-test-result-fetcher
Branch comparator (Formerly: Downstream branchdiff reporting)	scripts/start_job.py	branch-comparator
Review sheet backport updater	scripts/start_job.py	review-sheet-backport-updater
Reviewsync	scripts/start_job.py	reviewsync

Use-cases

Examples for YARN backporter

To backport YARN-6221 to 2 branches, run these commands:

yarn-backport YARN-6221 COMPX-6664 cdpd-master
yarn-backport YARN-6221 COMPX-6664 CDH-7.1-maint --no-fetch

The first argument is the upstream Jira ID
The second argument is the downstream Jira ID.
The third argument is the downstream branch.
The --no-fetch option is a means to skip git fetch on both repos.

How to backport to an already existing relation chain?

Go to Gerrit UI and download the patch. For example:

git fetch "https://gerrit.sjc.cloudera.com/cdh/hadoop" refs/changes/29/156429/5 && git checkout FETCH_HEAD

Checkout a new branch

git checkout -b my-relation-chain

Run backporter with:

yarn-backport YARN-10314 COMPX-7855 CDH-7.1.7.1000 --no-fetch --downstream_base_ref my-relation-chain

where:
The first argument is the upstream Jira ID
The second argument is the downstream Jira ID.
The third argument is the downstream branch.
The --no-fetch option is a means to skip git fetch on both repos.
The --downstream_base_ref <local-branch is a way to use a local branch to base the backport on so the Git remote name won't be prepended.

Finally, I set up two aliases for pushing the changes to the downstream repo:

alias git-push-to-cdpdmaster="git push <REMOTE> HEAD:refs/for/cdpd-master%<REVIEWER_LIST>"
alias git-push-to-cdh71maint="git push <REMOTE> HEAD:refs/for/CDH-7.1-maint%<REVIEWER_LIST>"

where REVIEWER_LIST is in this format: "r=user1,r=user2,r=user3,..."

Contributing

Setup of pre-commit

Configure precommit as described in this blogpost.

Commands:

Install precommit: pip install pre-commit
Make sure to add pre-commit to your path. For example, on a Mac system, pre-commit is installed here: $HOME/Library/Python/3.8/bin/pre-commit.
Execute pre-commit install to install git hooks in your .git/ directory.

Running the tests

TODO

Troubleshooting

Installation issues

In case you're facing a similar issue:

An error has occurred: InvalidManifestError: 
=====> /<userhome>/.cache/pre-commit/repoBP08UH/.pre-commit-hooks.yaml does not exist
Check the log at /<userhome>/.cache/pre-commit/pre-commit.log

, please run: pre-commit autoupdate

More info can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 1,157 Commits
.github/workflows		.github/workflows
docker		docker
legacy-scripts/branch-comparator		legacy-scripts/branch-comparator
tests		tests
yarndevtools		yarndevtools
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.run		.run
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
initial_setup.sh		initial_setup.sh
marshmallow_test.py		marshmallow_test.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

License

szilard-nemeth/yarn-dev-tools

Folders and files

Latest commit

History

Repository files navigation

YARN-dev-tools

Main dependencies

Contributing

Authors

License

Acknowledgments

Getting started

Use yarn-dev-tools from package (Recommended)

Use yarn-dev-tools from source

Setting up handy aliases to use yarn-dev-tools

Setting up yarn-dev-tools with Cloudera CDSW

Initial setup

Use-cases

Examples for YARN backporter

How to backport to an already existing relation chain?

Contributing

Setup of pre-commit

Running the tests

Troubleshooting

Installation issues

About

Topics

Resources

License

Stars

Watchers

Forks

Languages