Parser for Web of Science XML dataset

Python XML parser for Web of Science XML file. See example XML file from yadudoc/wos_builder. The implementation is based on yadudoc/wos_builder. I just make is as a function that can be easily integrate with others platform like Spark or multiprocessing.

Example

import wos_parser as wp
records = wp.read_xml('sample.xml')
authors = [wp.extract_authors(record) for record in records] # you can flatten and transform to dataframe

Parser Available

Using read_xml in order to read Web of Science XML file to list of element trees. Each element tree can be parsed to these following function to get dictionary or list of dictionary output.

extract_pub_info
extract_authors
extract_addresses
extract_publisher
extract_funding
extract_conferences
extract_references
extract_identifiers

Installation

Clone the repository and install using setup.py

$ git clone https://github.com/titipata/wos_parser
$ cd wos_parser
$ python setup.py install

or via pip

$ pip install git+https://github.com/titipata/wos_parser.git

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
wos_parser		wos_parser
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wos_parser

wos_parser

.gitignore

.gitignore

.travis.yml

.travis.yml

README.md

README.md

setup.py

setup.py

Repository files navigation

Parser for Web of Science XML dataset

Example

Parser Available

Installation

License

About

Releases

Packages

Contributors 4

Languages

titipata/wos_parser

Folders and files

Latest commit

History

Repository files navigation

Parser for Web of Science XML dataset

Example

Parser Available

Installation

License

About

Topics

Resources

Stars

Watchers

Forks

Languages