Skip to content

petermr/petermr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

petermr repositories

Many of these repos are widely used in collaborative projects and include:

  • code
  • data
  • projects

This special repo is to coordinate navigation and discussion

discussion lists

The "Discussions" for this repo https://github.com/petermr/petermr/discussions include discussions for the other repos and are of indicated by their name. They may replace our (private) Slack for all public-facing material (private project management will remain on Slack).

active repos

active Python projects:

For context: We have 4 packages (if that's the right word). They are largely standalone but can have useful library routines. They all share a common data structure on disk (simply named directories). This means that state is less important and often held on the filesystem. It also means that data can be further manipulated by Unix tools and other utilities. This is very fluid as we are constantly adding new data substructures. (I developed much of this in Java - https://github.com/petermr/ami3/blob/master/README.md) . The top directory is a CProject and its document children are called CTrees as they are useful split into many subdirectory trees.

Each package has a maintainer. These are all volunteers. Their Python is all self-taught . There are also interns - mixture of compsci/engineers/plant_sci who have a 3-month stay. They test the tools, develop resources, explore text-mining, NLP, image analysis, machine-learning, etc. They are encouraged to use the packages, link them into Python scripts or Notebooks but don't have time for serious development. (They might add readers or exporters).

  • pygetpapers , Ayush Garg. https://github.com/petermr/pygetpapers . Searches and downloads articles from repositories. Standalone, but the results may be used by docanalysis or possibly imageanalysis. Can be called from other tools.

  • docanalysis. Shweata Hegde. https://github.com/petermr/docanalysis . Ingests CProjects and carries out text-analysis of documents, including sectioning, NLP/text-mining, vocabulary generation. Uses NLTK and other Python tools for many operations, and spaCy, scispaCy for annotation of entities. Outputs summary data, correlations, word-dictionaries. Links entities to Wikidata.

  • pyamiimage, Anuv Chakroborty + PMR. https://github.com/petermr/pyamiimage . Ingests Figures/images, applies many image processing techniques (erode-dilate, colour quantization, skeletons, etc.), extracts words (Tesseract) , extracts lines and symbols (uses sknw/NetworkX) and recreates semantic diagrams (not finished)

  • py4ami . PMR. https://github.com/petermr/pyami . Translation of ami3(J) to Python. Processes CProjects to extract and combine primitives into semantic objects. Some functionality overlaps with docanalysis and imageanalysis. Includes libraries (e.g. for Wikimedia) and includes prototype GUI in tkinter, and a complex structure of word-dictionaries covering science and related disciplines. (Note the project is called pyami locally but there is already a PyAMI project so there it is called py4ami)

All packages aim to have a common commandline approach, use config files, generate and process CProjects (e.g. iterating over CTrees and applying filters, transformers, map/reduce, etc.). All 4 packages have been uploaded to PyPI

basicTest

Checks that the Python environment works (independently of the applications) https://github.com/petermr/basicTest/blob/main/README.md

presentations

Some presentations about the software, many from collaborators/interns

pygetpapers

notebook

docanalysis

wikidata

About

discussions on any of the petermr repos and related topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published