Skip to content

mcooper/village-names

Repository files navigation

village-names

I want to classify village names in west Africa, based on n-gram similarity and spatial distance.

About a year ago, I started fooling around with a dataset of toponyms from geonames.org. I found lots of cool patterns around toponym prefixes and suffixes that I was familiar with, and I would like to use machine learning to classify all of the villages, presumably revealing ethnic group boundaries and affinities. To see the results of my original project, check out the shiny app I made, and a blog post about why I think this kind of work could be useful.

With the hopes that I’ll pick up this project and work on it some more, I’m dumping all my old scripts into a github repo. They might not be very well sorted.

The workflow is:

  1. Normalize village names to try to remove French and English differences in spelling, so ‘gn’ -> ‘ny’ or ‘bougou’ -> ‘bugu’

  2. Find all n-grams (in this case 3-grams) that show significant spatial clustering.

  3. Using a dataset with latitude, longitude, and a binary presence-absence variable for every village name and n-gram, classify all of the village names using ML.

EDIT: An example of this work is at mcooper.github.io/vill-names.html

About

Classifying village names

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages