Skip to content

GEODE-project/perdido-geoparsing-notebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geoparsing Tutorial Notebook

Jupyter notebook for geoparsing historical encyclopedia texts in French using the PERDIDO Geoparser.

This notebook is proposed by L. Moncla (INSA Lyon) and K. McDonough (The Alan Turing Institute) as part of the GEODE project.

Overview

In this tutorial, we demonstrate how to use a custom version of the Perdido geoparser python library developed in the GEODE project. We will use texts from Diderot and d’Alembert’s Encyclopédie as a case study for querying a corpus and wrangling geoparsed data. We will also compare Perdido’s NER annotations (e.g. it's output) to the results of other well-known python NER libraries (spaCy and Stanza).

In this tutorial, we'll learn about a few different things.

  • How to load data from TEI-XML files into a Python dataframe
  • Use Python dataframe for simple data analysis
  • Test the PERDIDO API for preprocessing French texts (part-of-speech tagging)
  • Test the PERDIDO API for geoparsing (geotagging + geocoding) Encyclopedie articles
  • Display custom geotagging results (PERDIDO TEI-XML) with the displaCy Named Entity Visualizer
  • Display geocoding results on a map

Open the notebook in the cloud

You can open this notebook in an executable and remote environment with Binder or Open In Colab

Set up a python environment

Clone this github repository

git clone https://github.com/GEODE-project/perdido-geoparsing-notebook.git

Configure the environment with all dependencies

  • Create a new environment called tutorial-geoparsing-py39
conda create -n tutorial-geoparsing-py39 python=3.9
  • Activate the environment
conda activate tutorial-geoparsing-py39
  • Install fiona package with conda (avoid an issue with pip)
conda install fiona==1.8.21
  • Install dependencies with pip
pip install -r requirements.txt

Launch the jupyter server

jupyter notebook

Acknowledgement

Data courtesy the ARTFL Encyclopédie Project, University of Chicago.

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).