Skip to content

Commit

Permalink
Merge branch 'release-0.2' into stable
Browse files Browse the repository at this point in the history
  • Loading branch information
bskinn committed Oct 27, 2019
2 parents ebf182e + b64a110 commit c25506e
Show file tree
Hide file tree
Showing 47 changed files with 2,381 additions and 334 deletions.
15 changes: 9 additions & 6 deletions .travis.yml
@@ -1,16 +1,19 @@
dist: xenial
install:
- pip install -r requirements-travis.txt
# - pip install -e .
# - sh -c 'cd doc; make html; mkdir scratch'
language: python
python:
- 3.4
- 3.5
- 3.6
- 3.7-dev
- 3.7
- 3.8-dev
script:
- python --version
- pip list
- coverage run tests.py -a
- flake8 pent
# - echo $TRAVIS_PYTHON_VERSION | grep -e '^3\.6' && sh -c 'cd doc; make doctest' || echo 'No doctest.'
- echo $TRAVIS_PYTHON_VERSION | grep -e '^3\.6' && codecov || echo "No codecov."
- do_rest=$( echo $TRAVIS_PYTHON_VERSION | grep -e '^3\.6' | wc -l )
# - if [ $do_rest -gt 0 ]; then pip install black; black --check .; else echo "No black."; fi
# - if [ $do_rest -gt 0 ]; then sh -c 'cd doc; make doctest'; else echo "No doctest."; fi
- if [ $do_rest -gt 0 ]; then codecov; else echo "No codecov."; fi

62 changes: 37 additions & 25 deletions CHANGELOG.md
Expand Up @@ -5,45 +5,57 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

### [Unreleased]

...
### v0.2 [2019-10-26]

#### Fixed

- Optional-line flag behavior now fixed and ~thoroughly tested.
Required a change to the accepted behavior, such that an optional pattern
matches (1) the line specified, (2) a blank line, or (3) no line.

#### Changed

- The semantics of the 'decimal' and 'float' number tokens have been swapped.
'Decimal' tokens will only match non-scientific-notation decimal values, while
'float' values match either 'decimal' or 'scinot' formatted values.


### v0.2.0rc1 [2018-10-28]

#### Added

* "Misc" token type, '&', matching arbitrary non-whitespace content
* Optional whitespace can now be specified after number, literal, and misc
tokens, in addition to 'required whitespace after' and
'no whitespace after'
* New helper function `column_stack_2d`
* Needs performance improvements for large arrays
* New 'optional line' token type
* Works irregularly, perhaps due to quirks in managing optional
groups/capture groups within the Python regex engine?
* New property flags on `Token` to indicate the new features added
('misc' token type, optional-whitespace-after, etc.)
- "Misc" token type, '&', matching arbitrary non-whitespace content
- Optional whitespace can now be specified after number, literal, and misc
tokens, in addition to 'required whitespace after' and
'no whitespace after'
- New helper function `column_stack_2d`
- Needs performance improvements for large arrays
- New 'optional line' token type
- Works irregularly, perhaps due to quirks in managing optional
groups/capture groups within the Python regex engine?
- New property flags on `Token` to indicate the new features added
('misc' token type, optional-whitespace-after, etc.)

#### Changed

* Switched certain lists within the `Parser.capture_struct` return
dict structure to a type that automatically passes through a dict key to the
single element of those lists, if they are length-one. This
simplifies the syntax of a number of use cases by eliminating explicit `[0]`
indexing.
* `Parser` instances now syntax-check their `head`/`body`/`tail` patterns
- Switched certain lists within the `Parser.capture_struct` return
dict structure to a type that automatically passes through a dict key to the
single element of those lists, if they are length-one. This
simplifies the syntax of a number of use cases by eliminating explicit `[0]`
indexing.
- `Parser` instances now syntax-check their `head`/`body`/`tail` patterns
at instantiation, instead of at the first capture attempt.


### v0.1.0 [2018-09-23]

#### Features

* Three token types implemented to date: numeric, string-literal, "any"
* Parsing of multiple levels of recursive nested data; tested only
to two leves of nesting to date.
* Each nested level of structure can have head/body/tail
* Captured tokens can be easily retrieved from head/tail at the top level
parser; no good head or tail capture yet from within nested structures
- Three token types implemented to date: numeric, string-literal, "any"
- Parsing of multiple levels of recursive nested data; tested only
to two leves of nesting to date.
- Each nested level of structure can have head/body/tail
- Captured tokens can be easily retrieved from head/tail at the top level
parser; no good head or tail capture yet from within nested structures

2 changes: 1 addition & 1 deletion LICENSE.txt
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2018 Brian Skinn
Copyright (c) 2018-2019 Brian Skinn

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
45 changes: 24 additions & 21 deletions README.rst
@@ -1,33 +1,34 @@
pent Extracts Numerical Text
============================

*Mini-language driven parser for structured numerical data*
*Mini-language driven parser for structured numerical (or other) data
in free text*

**Current Development Version:**

.. image:: https://travis-ci.org/bskinn/pent.svg?branch=dev
.. image:: https://img.shields.io/travis/bskinn/pent?label=travis-ci&logo=travis
:target: https://travis-ci.org/bskinn/pent

.. image:: https://codecov.io/gh/bskinn/pent/branch/dev/graph/badge.svg
.. image:: https://codecov.io/gh/bskinn/pent/branch/master/graph/badge.svg
:target: https://codecov.io/gh/bskinn/pent

**Most Recent Stable Release:**

.. image:: https://img.shields.io/pypi/v/pent.svg
.. image:: https://img.shields.io/pypi/v/pent.svg?logo=pypi
:target: https://pypi.org/project/pent

.. image:: https://img.shields.io/pypi/pyversions/pent.svg
.. image:: https://img.shields.io/pypi/pyversions/pent.svg?logo=python

**Info:**

.. image:: https://img.shields.io/readthedocs/pent/latest.svg
.. image:: https://img.shields.io/readthedocs/pent/latest
:target: http://pent.readthedocs.io/en/latest/

.. image:: https://img.shields.io/github/license/mashape/apistatus.svg
:target: https://github.com/bskinn/pent/blob/master/LICENSE.txt
:target: https://github.com/bskinn/pent/blob/stable/LICENSE.txt

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/ambv/black
:target: https://github.com/psf/black

----

Expand Down Expand Up @@ -73,8 +74,8 @@ but that's just exhausting drudgery if there are dozens of files involved.

Automating the parsing via a line-by-line string search would work fine
(this is how |cclib|_ implements its data imports), but a new line-by-line
method must be implemented any time one encounters a new kind of dataset,
and any time the formatting of a given dataset changes between software versions.
method is needed for every new kind of dataset,
and any time the formatting of a given dataset changes.

It's not *too* hard to
`write regex <https://github.com/bskinn/opan/blob/12c8e98de2a81bbd570c821644063d975e2ab03e/opan/hess.py#L688-L701>`__
Expand All @@ -93,8 +94,7 @@ of lines, without writing **any** regex at all:

.. code:: python
>>> with (pathlib.Path() / "pent" / "test" / "C2F4_01.hess").open() as f:
... data = f.read()
>>> data = pathlib.Path("pent", "test", "C2F4_01.hess").read_text()
>>> prs = pent.Parser(
... head=("@.$vibrational_frequencies", "#.+i"),
... body=("#.+i #!..f")
Expand Down Expand Up @@ -127,7 +127,7 @@ column vector, because the data runs down the column in the file.
``pent`` can handle larger, more deeply nested data as well.
Take `this 18x18 matrix <https://github.com/bskinn/pent/blob/cbb3c9b24c773b51b4988485b838537043ec8299/pent/test/C2F4_01.hess#L13-L71>`__
within ``C2F4_01.hess``, for example.
Here, it's necessary to pass a ``Parser`` as the `body` of another ``Parser``:
Here, it's necessary to pass a ``Parser`` as the *body* of another ``Parser``:

.. code:: python
Expand All @@ -139,16 +139,18 @@ Here, it's necessary to pass a ``Parser`` as the `body` of another ``Parser``:
... )
... )
>>> result = prs_hess.capture_body(data)
>>> arr = np.column_stack(np.array(_, dtype=float) for _ in result[0])
>>> arr = np.column_stack([np.array(_, dtype=float) for _ in result[0]])
>>> print(arr[:3, :7])
[[ 0.468819 -0.006771 0.020586 -0.38269 0.017874 -0.05449 -0.044552]
[-0.006719 0.022602 -0.016183 0.010997 -0.033397 0.014422 -0.01501 ]
[ 0.020559 -0.016184 0.066859 -0.033601 0.014417 -0.072836 0.045825]]
The need for the ``for``/``in`` iteration expression, the ``[0]`` index into ``result``,
The need for the generator expression, the ``[0]`` index into ``result``,
and the composition via ``np.column_stack`` arises
due to the manner in which ``pent`` returns data from a nested match like this.
See the `documentation <https://pent.readthedocs.io/en/latest>`__ for more information.
See the `documentation <https://pent.readthedocs.io/en/latest>`__,
in particular `this example <https://pent.readthedocs.io/en/latest/tutorial/examples/nested_parsers.html>`__,
for more information.

The grammar of the ``pent`` mini-language is designed to be flexible enough that
it should handle essentially all well-formed structured data, and even some data
Expand All @@ -159,21 +161,22 @@ parsing `this data block <https://github.com/bskinn/pent/blob/eaa79a09af88d3836d

-----

Alpha release(s) available on `PyPI <https://pypi.org/project/pent>`__: ``pip install pent``
Beta releases available on `PyPI <https://pypi.org/project/pent>`__: ``pip install pent``

Full documentation (pending) is hosted at
Full documentation is hosted at
`Read The Docs <http://pent.readthedocs.io/en/latest/>`__.

Source on `GitHub <https://github.com/bskinn/pent>`__. Bug reports,
feature requests, and ``Parser`` pattern composition help requests
feature requests, and ``Parser`` construction help requests
are welcomed at the
`Issues <https://github.com/bskinn/pent/issues>`__ page there.

Copyright (c) Brian Skinn 2018
Copyright (c) Brian Skinn 2018-2019

License: The MIT License. See `LICENSE.txt <https://github.com/bskinn/pent/blob/master/LICENSE.txt>`__
for full license terms.


.. |cclib| replace:: ``cclib``

.. _cclib: https://github.com/cclib/cclib
.. _cclib: https://github.com/cclib/cclib
6 changes: 5 additions & 1 deletion doc/Makefile
Expand Up @@ -14,7 +14,11 @@ help:

.PHONY: help Makefile

# sphinx-autobuild target
livehtml:
sphinx-autobuild "$(SOURCEDIR)" "$(BUILDDIR)/html" $(SPHINXOPTS) $(O)

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
6 changes: 5 additions & 1 deletion doc/make.bat
Expand Up @@ -26,7 +26,11 @@ if errorlevel 9009 (
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %2
if "%1" == "livehtml" (
sphinx-autobuild %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %2
) else (
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %2
)
goto end

:help
Expand Down
Empty file added doc/source/_static/.pin
Empty file.
33 changes: 33 additions & 0 deletions doc/source/api.rst
@@ -0,0 +1,33 @@
.. Dump of an API page, until it gets cleaned up
API (draft page)
================

Unstructured API dump, to provide cross-reference targets
for other portions of the docs.

Any of the objects/attributes/methods documented here may
become private implementation details in future
versions of ``pent``.


.. automodule:: pent.parser
:members:

.. automodule:: pent.token
:members:

.. automodule:: pent.patterns
:members:

.. automodule:: pent.enums
:members:

.. automodule:: pent.errors
:members:

.. automodule:: pent.thrulist
:members:

.. automodule:: pent.utils
:members:

0 comments on commit c25506e

Please sign in to comment.