Skip to content

Living-with-machines/alto2txt2fixture

Repository files navigation

alto2txt2fixture

mit-license CI coverage pre-commit.ci status Code style: black doc

alto2txt2fixture is a standalone tool to convert alto2txt XML output and other related datasets into JSON (and where feasible CSV) data with corresponding relational IDs to ease general use and ingestion into a relational database.

We target the the JSON produced for importing into lwmdb: a database built using the Django python webframework database fixture structure.

Installation and simple use

We provide a command line interface to process alto2txt XML files stored locally (or mounted via azure blobfuse), and for additional public data we automate a means of downloading those automatically.

Installation

We recommend downloading a copy of the reposity or using git clone. From a local copy use poetry to install dependencies:

$ cd alto2txt2fixture
$ poetry install

If you would like to test, render documentation and/or contribute to the code included dev dependencies in a local install:

$ poetry install --with dev

Simple use

To processing newspaper metadata with a local copy of alto2txt XML results, it's easiest to have that data in the same folder as your alto2txt2fixture checkout and poetry installed folder. One arranged, you should be able to begin the JSON converstion with

$ poetry run a2t2f-news

To generate related data in JSON and CSV form, assuming you have an internet collection and access to a living-with-machines azure account, the following will download related data into JSON and CSV files. The JSON results should be consistent with lwmdb tables for ease of import.

$ poetry run a2t2f-adj

Documentation

More detailed documenation is available at https://living-with-machines.github.io/alto2txt2fixture/

About

Converts metadata from alto2txt into JSON data with corresponding relational IDs for ingestion into a relational database

Topics

Resources

License

Stars

Watchers

Forks

Languages