Python command line tools to convert files between XML and YAML, preserving attributes and comments (with minor corrections). The default file encoding for both types is UTF-8 without a BOM. Now includes more console entry points to grep or sort interesting YAML files (eg, lists of rules found in the SCAP Security Guide) and support for more input file types to ingest SSG and other upstream data, eg, NIST oscal-content.
Available console commands and scripts:
ymltoxml
- YAML / XML round-trip conversion and cleanupyasort
- sort large lists in YAML filesyagrep
- grep for keys/values in YAML filesoscal
(WIP) - ingest NIST 800-53 content in multiple formatsanalyze_control_ids.py
(experimental) - analyze control ID sets
This package is not yet published on PyPI, thus use one of the following to install ymltoxml on any platform. Install from the main branch:
$ https://github.com/sarnold/ymltoxml/archive/refs/heads/main.tar.gz
or use this command to install a specific release version:
$ pip install https://github.com/sarnold/ymltoxml/releases/download/0.3.0/ymltoxml-0.3.0.tar.gz
The full package provides the ymltoxml.py
executable as well as a reference configuration file with defaults for all values.
If you'd rather work from the source repository, it supports the common idiom to install it on your system in a virtual env after cloning:
$ python3 -m venv env
$ source env/bin/activate
$ pip install .
$ ymltoxml --version
$ ymltoxml --dump-config
$ deactivate
The alternative to python venv is the tox
test driver. If you have it installed already, see the example tox commands below.
The current version supports minimal command options; if no options are provided, the only required arguments are one or more files of a single type:
$ ymltoxml
usage: ymltoxml [-h] [--version] [-v] [-d] [-s] [-i [FILE]] [-o [FILE]]
[FILE ...]
Transform YAML to XML and XML to YAML
positional arguments:
FILE Process input file(s) to target extension (default:
None)
options:
-h, --help show this help message and exit
--version show program's version number and exit
-v, --verbose Display more processing info (default: False)
-d, --dump-config Dump default configuration file to stdout (default:
False)
-s, --save-config save active config to default filename (.ymltoxml.yml)
and exit (default: False)
-i [FILE], --infile [FILE]
Path to single input file (use with --outfile)
(default: None)
-o [FILE], --outfile [FILE]
Path to single output file (use with --infile)
(default: None)
- for processing individual files/paths, use the
--infile
option, either with or without the--outfile
option - for processing multiple files, pass all files as arguments (paths can be relative or absolute)
- when passing input files as arguments, the output file names/paths are the same as the input files but with the (new) output extension
By default it will process one or more input files as command args, typically in the current directory, however, the --infile
option will only process a single file path, optionally with an output file path, with no extra (file) arguments.
The main processing tweaks for yml/xml output formatting are specified in the default configuration file; if you need to change something, you can use your own config file in the working directory; note the local copy must be named .ymltoxml.yaml
. To get a copy of the default configuration file, do:
$ cd path/to/work/dir/
$ ymltoxml --save-config
$ $EDITOR .ymltoxml.yaml
A new helper script is now included for searching keys or values in YAML files. The yagrep
script also has its own built-in config file, which can be copied and edited as shown above. In this case the script is intended to feel more-or-less like grep
so the default config should Just Work. That said, the script uses the dpath
python library, so you may need to change the default "path" separator if your data has keys containing forward slashes (see the upstream docs for details).
General usage guidelines:
- use the
-f
(filter) arg to search for a value string - follow the (json) output from above to find the key name
- then use the
-l
(lookup) arg to extract the values for the key
Useful yagrep config file settings:
- default_separator
change the path separator to something like
;
if data has forward slashes- output_format
set the output format to
raw
for unformatted output
$ yagrep -h
usage: yagrep [-h] [--version] [-v] [-d] [-s] [-f | -l] TEXT FILE [FILE ...]
Search in YAML files for keys and values.
positional arguments:
TEXT Text string to look for (one-only, required) (default:
None)
FILE Look in file(s) for text string (at least one, required)
(default: None)
options:
-h, --help show this help message and exit
--version show program's version number and exit
-v, --verbose Display more processing info (default: False)
-d, --dump-config Dump default configuration file to stdout (default:
False)
-s, --save-config save active config to default filename (.yagrep.yml) and
exit (default: False)
-f, --filter Filter out data not matching input string (no paths)
(default: False)
-l, --lookup Lookup by key and return list of values for any matches
(default: False)
Yet another helper script is included for sorting large (YAML) lists. The yasort
script also uses its own configuration file, creatively named .yasort.yaml
. The above applies equally to this config file.
$ yasort -h
usage: yasort [-h] [--version] [-v] [-d] [-s] [FILE ...]
Sort YAML lists and write new files.
positional arguments:
FILE Process input file(s) to target directory (default: None)
options:
-h, --help show this help message and exit
--version show program's version number and exit
-v, --verbose Display more processing info (default: False)
-d, --dump-config Dump default configuration file to stdout (default:
False)
-s, --save-config save active config to default filename (.yasort.yml) and
exit (default: False)
All of the optional arguments for yasort
are essentially orthogonal to sorting, thus the only required argument for normal usage is one or more input files. All of the user settings are in the default configuration file shown below; use the --save-config
option to create your own config file.
Default yasort.yaml:
---
# comments should be preserved
file_encoding: 'utf-8'
default_yml_ext: '.yaml'
output_dirname: 'sorted-out'
default_parent_key: 'controls'
default_sort_key: 'rules'
has_parent_key: true
preserve_quotes: true
process_comments: false
mapping: 4
sequence: 6
offset: 4
We mainly test on mavlink XML message definitions and NIST/SSG YAML files, so round-trip conversion may not work at all on arbitrarily complex XML files with namespaces, etc. The current round-trip is not exact, due to the following:
- missing encoding is added to version tag
- leading/trailing whitespace in text elements and comments is not preserved
- XML - elements with self-closing tags are converted to full closing tags
- XML - empty elements on more than one line are not preserved
For the files tested (eg, mavlink) the end result is cleaner/shinier XML.
The following covers two types of workflows, one for tool usage in other (external) projects, and one for (internal) tool development.
The ymltoxml tools are intended to be part of a larger workflow, ie, developing custom mavlink message dialects and generating/deploying the resulting mavlink language interfaces. To be more specific, for this example we use a mavlink-compatible component running on a micro-controller, thus the target language bindings are C and C++.
Tool requirements for the full mavlink workflow:
- initially just recent pymavlink, Python, and Tox
Both mavlink and pymavlink require a (host) GCC toolchain for full builds, however, the basic workflow to generate mavlink library headers requires only Git, Python, and Tox.
The yasort/yagrep tools are also intended to be part of a larger workflow, mainly working with SCAP content, ie, the scap-security-guide source files (or just content). It is currently used to sort profiles with large numbers of rules, as well as create control files and analyze existing controls.
The yasort configuration file defaults are based on existing yaml structure, but feel free to change them for another use case. To adjust how the sorting works, make a local config file (see above) and edit as needed the following options:
- output_dirname
directory for output file(s)
- default_parent_key
parent key if sort target is sublist
- default_sort_key
the key you want to sort
- has_parent_key
set true if sorting a sublist
- default_yml_ext
change the output file extension
The rest of the options are for YAML formatting/flow style (see the ruamel documentation for formatting details)
As long as you have git and at least Python 3.6, then the "easy" dev workflow is to clone this repository and install Tox via your system package manager, eg:
$ sudo apt-get update
$ sudo apt-get install tox
After cloning this repository, you can run the repo checks with the tox
command. It will build a virtual python environment with all the dependencies and run the specified commands, eg:
$ git clone https://github.com/sarnold/ymltoxml
$ cd ymltoxml/
$ tox -e py
The above will run the tests using your (default) system Python; to specify the Python version and host OS type, run something like:
$ tox -e py39-linux
Additional tox
commands:
tox -e changes
(re)generate the changelog filetox -e conv
round-trip conversion test on mavlink dialecttox -e dev
pip "developer" installtox -e style
will run flake8 style checkstox -e lint
will run pylint (somewhat less permissive than PEP8/flake8 checks)tox -e mypy
will run mypy import and type checkingtox -e isort
will run isort import checkstox -e clean
will remove temporary test files
To build/lint the api docs, use the following tox commands:
tox -e docs
build the documentation using sphinx and the api-doc plugintox -e docs-lint
build the docs and run the sphinx link checking
We use the gitchangelog action to generate our changelog file and GH Release page, as well as the gitchangelog commit message prefix "tag" modifiers to help it categorize/filter commits for a tidier changelog. Please use the appropriate ACTION modifiers in any Pull Requests. Some examples of commit message summary "tags" are shown in .gitchangelog.rc
file and reproduced below:
new: usr: support of bazaar implemented
chg: re-indentend some lines !cosmetic
new: dev: updated code to be compatible with last version of killer lib.
fix: pkg: updated year of licence coverage.
new: test: added a bunch of test around user usability of feature X.
fix: typo in spelling my name in comment. !minor
See the following docs page (or generate-changelog on Github) for more details.
This repo is also pre-commit enabled for various linting and format checks. The checks run automatically on commit and will fail the commit (if not clean) with some checks performing simple file corrections.
If other checks fail on commit, the failure display should explain the error types and line numbers. Note you must fix any fatal errors for the commit to succeed; some errors should be fixed automatically (use git status
and git diff
to review any changes).
See the following pages for more information on gitchangelog and pre-commit.
You will need to install pre-commit before contributing any changes; installing it using your system's package manager is recommended, otherwise install with pip into your usual virtual environment using something like:
$ sudo emerge pre-commit --or--
$ pip install pre-commit
then install it into the repo you just cloned:
$ git clone https://github.com/sarnold/ymltoxml
$ cd ymltoxml/
$ pre-commit install
It's usually a good idea to update the hooks to the latest version:
pre-commit autoupdate