GitHub - vaneseltine/nominally: A maximum-strength name parser for record linkage.

nominally: a maximum-strength name parser for record linkage

🔗 Names

Nominally simplifies and parses a personal name written in Western name order into six core fields: title, first, middle, last, suffix, and nickname.

Typically, nominally is used to parse entire lists or pd.Series of names en masse. This package includes a command line tool to parse a single name for convenient one-off testing and examples.

Human names can be difficult to work with in data. Varying quality and practices across institutions and datasets introduce noise and cause misrepresentation, increasing linkage and deduplication challenges. Common errors and discrepancies include (and this list is by no means exhaustive):

Arbitrarily split first and middle names.
Misplaced prefixes of last names such as "van" and "de la."
Multiple last names partitioned into middle name fields.
Titles and suffixes variously recorded in different fields, with or without separators.
Inconsistent capture of accents, the ʻokina, and other non-ASCII characters.
Single name fields arbitrarily concatenating name parts.

Nominally produces fields intended for comparisons between or within datasets. As such, names come out formatted for data without regard to human syntactic preference: de von ausfern, mr johann g rather than Mr. Johann G. de von Ausfern.

📜 Documentation

Full nominally documentation is maintained on ReadTheDocs: https://nominally.readthedocs.io/en/latest/

⛏️ Installation

Releases of nominally are distributed on PyPI, so the recommended approach is to install via pip:

$ python -m pip install -U nominally

📓 Getting Started

Call parse_name() to parse out the six core fields:

$ python -q
>>> from nominally import parse_name
>>> parse_name("Vimes, jr, Mr. Samuel 'Sam'")
{
    'title': 'mr',
    'first': 'samuel',
    'middle': '',
    'last': 'vimes',
    'suffix': 'jr',
    'nickname': 'sam'
}

Dive into the Name class to parse out a reformatted string...

>>> from nominally import Name
>>> n = Name("Vimes, jr, Mr. Samuel 'Sam'")
>>> n
Name({
  'title': 'mr',
  'first': 'samuel',
  'middle': '',
  'last': 'vimes',
  'suffix': 'jr',
  'nickname': 'sam'
})
>>> str(n)
'vimes, mr samuel (sam) jr'

...or use the dict...

>>> dict(n)
{
  'title': 'mr',
  'first': 'samuel',
  'middle': '',
  'last': 'vimes',
  'suffix': 'jr',
  'nickname': 'sam'
}
>>> list(n.values())
['mr', 'samuel', '', 'vimes', 'jr', 'sam']

...or retrieve a more elaborate set of attributes...

>>> n.report()
{
  'raw': "Vimes, jr, Mr. Samuel 'Sam'",
  'cleaned': {'jr', 'sam', 'vimes, mr samuel'},
  'parsed': 'vimes, mr samuel (sam) jr',
  'list': ['mr', 'samuel', '', 'vimes', 'jr', 'sam'],
  'title': 'mr',
  'first': 'samuel',
  'middle': '',
  'last': 'vimes',
  'suffix': 'jr',
  'nickname': 'sam'
}

...or capture individual attributes.

>>> n.first
'samuel'
>>> n['last']
'vimes'
>>> n.get('suffix')
'jr'
>>> n.raw
"Vimes, jr, Mr. Samuel 'Sam'"

🖥️ Command Line

For a quick report, invoke the nominally command line tool:

$ nominally "Vimes, jr, Mr. Samuel 'Sam'"
       raw: Vimes, jr, Mr. Samuel 'Sam'
   cleaned: {'jr', 'vimes, mr samuel', 'sam'}
    parsed: vimes, mr samuel (sam) jr
      list: ['mr', 'samuel', '', 'vimes', 'jr', 'sam']
     title: mr
     first: samuel
    middle:
      last: vimes
    suffix: jr
  nickname: sam

🔬 Worked Examples

Binder hosts live Jupyter notebooks walking through examples of nominally.

These notebooks and additional examples reside in the Nominally Examples repository.

👩‍💻 Community

Interested in helping to improve nominally? Please see CONTRIBUTING.md.

CONTRIBUTING.md also includes directions to run tests, using a clone of the full repository.

Having problems with nominally? Need help or support? Feel free to open an issue here on Github, or contact me via email or Twitter (see my profile for links).

🧙‍ Author

💡 Acknowledgements

Nominally started as a fork of the python-nameparser package, and has benefitted considerably from this origin⸺especially the wealth of examples and tests developed for python-nameparser.

Name		Name	Last commit message	Last commit date
Latest commit History 620 Commits
.circleci		.circleci
docs		docs
nominally		nominally
requirements		requirements
stubs/unidecode		stubs/unidecode
test		test
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
noxfile.py		noxfile.py
paper.bib		paper.bib
paper.md		paper.md
pylintrc		pylintrc
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

vaneseltine/nominally

Folders and files

Latest commit

History

Repository files navigation

nominally: a maximum-strength name parser for record linkage

🔗 Names

📜 Documentation

⛏️ Installation

📓 Getting Started

🖥️ Command Line

🔬 Worked Examples

👩‍💻 Community

🧙‍ Author

💡 Acknowledgements

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages