Skip to content

impresso/impresso-text-acquisition

Repository files navigation

Impresso Text Importer

Documentation Status PyPI version PyPI - License

The Impresso TextImporter is a library and a collection of scripts to import newspaper data from a variety of formats (e.g. Olive XML, various flavors of Mets/Alto XML, etc.) into Impresso’s JSON format.

Please refer to the documentation for further information on this library.

Installation

With pip:

pip install impresso-text-importer

License

The second project 'impresso - Media Monitoring of the Past II. Beyond Borders: Connecting Historical Newspapers and Radio' is funded by the Swiss National Science Foundation (SNSF) under grant number CRSII5_213585 and the Luxembourg National Research Fund under grant No. 17498891.

Aiming to develop and consolidate tools to process and explore large-scale collections of historical newspapers and radio archives, and to study the impact of this tooling on historical research practices, Impresso II builds upon the first project – 'impresso - Media Monitoring of the Past' (grant number CRSII5_173719, Sinergia program). More information at https://impresso-project.ch.

Copyright (C) 2024 The impresso team (contributors to this program: Matteo Romanello, Maud Ehrmann, Alex Flückinger, Edoardo Tarek Hölzl, Pauline Conti).

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Affero General Public License for more details.