Skip to content

GGNoWayBack/cathodedataextractor

Repository files navigation

CathodeDataExtractor


Supported Python versions GitHub LICENSE PyPI version
Cathodedataextractor is a lightweight document-level information extraction pipeline that can automatically extract comprehensive properties related to synthesis parameters, cycling and rate performance of cathode materials from the literature of layered cathode materials for sodium-ion batteries.

Installation


pip install cathodedataextractor

Features


  • It is built on open-source libraries: pymatgen, text2chem, and ChemDataExtractor v2 with some modifications.
  • BatterySciBERT-uncased Multi-Label text classification model for filtering documents.
  • Automated comprehensive data extraction pipeline for cathode materials.
  • Paragraph Multi-Class classification algorithms for documents (HTML/XML) from the RSC and Elsevier.
  • A normalised entity handling process is provided.
  • An effective chemical abbreviation detection module.
  • Heuristic multi-level relation extraction algorithm for electrochemical properties.

In addition, the pipeline is also suitable for string sequence text extraction.

Quick start


Extract from documents

from glob import iglob
from cathodedataextractor.information_extraction_pipe import Pipeline

pipline = Pipeline()
for document in iglob('*ml'):
    extraction_results = pipline.extract(document)

Extract from string

from cathodedataextractor.information_extraction_pipe import Pipeline

extraction_results = Pipeline.from_string(
    'Apart from the conventional cationic redox of transition metals, '
    'both Na-deficit and Na-excess materials have showcased the ability '
    'to exploit oxygen redox activity as O2–/O2n– for a charge '
    'compensation mechanism. To realize cathodes with enhanced energy '
    'density, a technique like the incorporation of alkali metal ions '
    'into transition metal layers has been adopted. Recent work by Boisse '
    '(13) et al. displayed the impact of honeycomb cation ordering of '
    'a highly stabilized intermediate phase for a Na2RuO3 cathode material '
    'in instigating the anionic redox activity and providing a capacity '
    'of 180 mAh g–1 at 0.2C with a capacity retention of 89% for over '
    '50 cycles. More devoted efforts to realize the utmost potential '
    'of anionic redox ought to be carried out in the future.')

Issues?


You can either report an issue on GitHub or contact me directly. Try gouyx@mail2.sysu.edu.cn.

Citing


If the source code turns out to be helpful to your research, please cite the following work:

paper

Gou, Y., Zhang, Y., Zhu, J. et al. A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries. Sci Data 11, 372 (2024).