Merge branch 'release-0.2' into stable

bskinn · Oct 27, 2019 · c25506e · c25506e
2 parents ebf182e + b64a110
commit c25506e
Show file tree

Hide file tree

Showing 47 changed files with 2,381 additions and 334 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -1,16 +1,19 @@
+dist: xenial
 install:
  - pip install -r requirements-travis.txt
-# - pip install -e .
-# - sh -c 'cd doc; make html; mkdir scratch'
 language: python
 python:
- - 3.4
  - 3.5
  - 3.6
- - 3.7-dev
+ - 3.7
+ - 3.8-dev
 script:
+ - python --version
+ - pip list
  - coverage run tests.py -a
  - flake8 pent
-# - echo $TRAVIS_PYTHON_VERSION | grep -e '^3\.6' && sh -c 'cd doc; make doctest' || echo 'No doctest.'
- - echo $TRAVIS_PYTHON_VERSION | grep -e '^3\.6' && codecov || echo "No codecov."
+ - do_rest=$( echo $TRAVIS_PYTHON_VERSION | grep -e '^3\.6' | wc -l )
+# - if [ $do_rest -gt 0 ]; then pip install black; black --check .; else echo "No black."; fi
+# - if [ $do_rest -gt 0 ]; then sh -c 'cd doc; make doctest'; else echo "No doctest."; fi
+ - if [ $do_rest -gt 0 ]; then codecov; else echo "No codecov."; fi
 
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,45 +5,57 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
-### [Unreleased]
 
-...
+### v0.2 [2019-10-26]
+
+#### Fixed
+
+- Optional-line flag behavior now fixed and ~thoroughly tested.
+  Required a change to the accepted behavior, such that an optional pattern
+  matches (1) the line specified, (2) a blank line, or (3) no line.
+
+#### Changed
+
+- The semantics of the 'decimal' and 'float' number tokens have been swapped.
+  'Decimal' tokens will only match non-scientific-notation decimal values, while
+  'float' values match either 'decimal' or 'scinot' formatted values.
 
 
 ### v0.2.0rc1 [2018-10-28]
 
 #### Added
 
- * "Misc" token type, '&', matching arbitrary non-whitespace content
- * Optional whitespace can now be specified after number, literal, and misc
-   tokens, in addition to 'required whitespace after' and
-   'no whitespace after'
- * New helper function `column_stack_2d`
-   * Needs performance improvements for large arrays
- * New 'optional line' token type
-   * Works irregularly, perhaps due to quirks in managing optional
-     groups/capture groups within the Python regex engine?
- * New property flags on `Token` to indicate the new features added
-   ('misc' token type, optional-whitespace-after, etc.)
+- "Misc" token type, '&', matching arbitrary non-whitespace content
+- Optional whitespace can now be specified after number, literal, and misc
+  tokens, in addition to 'required whitespace after' and
+  'no whitespace after'
+- New helper function `column_stack_2d`
+  - Needs performance improvements for large arrays
+- New 'optional line' token type
+  - Works irregularly, perhaps due to quirks in managing optional
+    groups/capture groups within the Python regex engine?
+- New property flags on `Token` to indicate the new features added
+  ('misc' token type, optional-whitespace-after, etc.)
 
 #### Changed
 
- * Switched certain lists within the `Parser.capture_struct` return
-   dict structure to a type that automatically passes through a dict key to the
-   single element of those lists, if they are length-one. This
-   simplifies the syntax of a number of use cases by eliminating explicit `[0]`
-   indexing.
- * `Parser` instances now syntax-check their `head`/`body`/`tail` patterns
+- Switched certain lists within the `Parser.capture_struct` return
+  dict structure to a type that automatically passes through a dict key to the
+  single element of those lists, if they are length-one. This
+  simplifies the syntax of a number of use cases by eliminating explicit `[0]`
+  indexing.
+- `Parser` instances now syntax-check their `head`/`body`/`tail` patterns
    at instantiation, instead of at the first capture attempt.
 
 
 ### v0.1.0 [2018-09-23]
 
 #### Features
 
- * Three token types implemented to date: numeric, string-literal, "any"
- * Parsing of multiple levels of recursive nested data; tested only
-   to two leves of nesting to date.
- * Each nested level of structure can have head/body/tail
- * Captured tokens can be easily retrieved from head/tail at the top level
-   parser; no good head or tail capture yet from within nested structures
+- Three token types implemented to date: numeric, string-literal, "any"
+- Parsing of multiple levels of recursive nested data; tested only
+  to two leves of nesting to date.
+- Each nested level of structure can have head/body/tail
+- Captured tokens can be easily retrieved from head/tail at the top level
+  parser; no good head or tail capture yet from within nested structures
+
diff --git a/LICENSE.txt b/LICENSE.txt
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2018 Brian Skinn
+Copyright (c) 2018-2019 Brian Skinn
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.rst b/README.rst
@@ -1,33 +1,34 @@
 pent Extracts Numerical Text
 ============================
 
-*Mini-language driven parser for structured numerical data*
+*Mini-language driven parser for structured numerical (or other) data
+in free text*
 
 **Current Development Version:**
 
-.. image:: https://travis-ci.org/bskinn/pent.svg?branch=dev
+.. image::  https://img.shields.io/travis/bskinn/pent?label=travis-ci&logo=travis
     :target: https://travis-ci.org/bskinn/pent
 
-.. image:: https://codecov.io/gh/bskinn/pent/branch/dev/graph/badge.svg
+.. image:: https://codecov.io/gh/bskinn/pent/branch/master/graph/badge.svg
     :target: https://codecov.io/gh/bskinn/pent
 
 **Most Recent Stable Release:**
 
-.. image:: https://img.shields.io/pypi/v/pent.svg
+.. image:: https://img.shields.io/pypi/v/pent.svg?logo=pypi
     :target: https://pypi.org/project/pent
 
-.. image:: https://img.shields.io/pypi/pyversions/pent.svg
+.. image:: https://img.shields.io/pypi/pyversions/pent.svg?logo=python
 
 **Info:**
 
-.. image:: https://img.shields.io/readthedocs/pent/latest.svg
+.. image:: https://img.shields.io/readthedocs/pent/latest
     :target: http://pent.readthedocs.io/en/latest/
 
 .. image:: https://img.shields.io/github/license/mashape/apistatus.svg
-    :target: https://github.com/bskinn/pent/blob/master/LICENSE.txt
+    :target: https://github.com/bskinn/pent/blob/stable/LICENSE.txt
 
 .. image:: https://img.shields.io/badge/code%20style-black-000000.svg
-    :target: https://github.com/ambv/black
+    :target: https://github.com/psf/black
 
 ----
 
@@ -73,8 +74,8 @@ but that's just exhausting drudgery if there are dozens of files involved.
 
 Automating the parsing via a line-by-line string search would work fine
 (this is how |cclib|_ implements its data imports), but a new line-by-line
-method must be implemented any time one encounters a new kind of dataset,
-and any time the formatting of a given dataset changes between software versions.
+method is needed for every new kind of dataset,
+and any time the formatting of a given dataset changes.
 
 It's not *too* hard to
 `write regex <https://github.com/bskinn/opan/blob/12c8e98de2a81bbd570c821644063d975e2ab03e/opan/hess.py#L688-L701>`__
@@ -93,8 +94,7 @@ of lines, without writing **any** regex at all:
 
 .. code:: python
 
-    >>> with (pathlib.Path() / "pent" / "test" / "C2F4_01.hess").open() as f:
-    ...     data = f.read()
+    >>> data = pathlib.Path("pent", "test", "C2F4_01.hess").read_text()
     >>> prs = pent.Parser(
     ...     head=("@.$vibrational_frequencies", "#.+i"),
     ...     body=("#.+i #!..f")
@@ -127,7 +127,7 @@ column vector, because the data runs down the column in the file.
 ``pent`` can handle larger, more deeply nested data as well.
 Take `this 18x18 matrix <https://github.com/bskinn/pent/blob/cbb3c9b24c773b51b4988485b838537043ec8299/pent/test/C2F4_01.hess#L13-L71>`__
 within ``C2F4_01.hess``, for example.
-Here, it's necessary to pass a ``Parser`` as the `body` of another ``Parser``:
+Here, it's necessary to pass a ``Parser`` as the *body* of another ``Parser``:
 
 .. code:: python
 
@@ -139,16 +139,18 @@ Here, it's necessary to pass a ``Parser`` as the `body` of another ``Parser``:
     ...     )
     ... )
     >>> result = prs_hess.capture_body(data)
-    >>> arr = np.column_stack(np.array(_, dtype=float) for _ in result[0])
+    >>> arr = np.column_stack([np.array(_, dtype=float) for _ in result[0]])
     >>> print(arr[:3, :7])
     [[ 0.468819 -0.006771  0.020586 -0.38269   0.017874 -0.05449  -0.044552]
      [-0.006719  0.022602 -0.016183  0.010997 -0.033397  0.014422 -0.01501 ]
      [ 0.020559 -0.016184  0.066859 -0.033601  0.014417 -0.072836  0.045825]]
 
-The need for the ``for``/``in`` iteration expression, the ``[0]`` index into ``result``,
+The need for the generator expression, the ``[0]`` index into ``result``,
 and the composition via ``np.column_stack`` arises
 due to the manner in which ``pent`` returns data from a nested match like this.
-See the `documentation <https://pent.readthedocs.io/en/latest>`__ for more information.
+See the `documentation <https://pent.readthedocs.io/en/latest>`__,
+in particular `this example <https://pent.readthedocs.io/en/latest/tutorial/examples/nested_parsers.html>`__,
+for more information.
 
 The grammar of the ``pent`` mini-language is designed to be flexible enough that
 it should handle essentially all well-formed structured data, and even some data
@@ -159,21 +161,22 @@ parsing `this data block <https://github.com/bskinn/pent/blob/eaa79a09af88d3836d
 
 -----
 
-Alpha release(s) available on `PyPI <https://pypi.org/project/pent>`__: ``pip install pent``
+Beta releases available on `PyPI <https://pypi.org/project/pent>`__: ``pip install pent``
 
-Full documentation (pending) is hosted at
+Full documentation is hosted at
 `Read The Docs <http://pent.readthedocs.io/en/latest/>`__.
 
 Source on `GitHub <https://github.com/bskinn/pent>`__.  Bug reports,
-feature requests, and ``Parser`` pattern composition help requests
+feature requests, and ``Parser`` construction help requests
 are welcomed at the
 `Issues <https://github.com/bskinn/pent/issues>`__ page there.
 
-Copyright (c) Brian Skinn 2018
+Copyright (c) Brian Skinn 2018-2019
 
 License: The MIT License. See `LICENSE.txt <https://github.com/bskinn/pent/blob/master/LICENSE.txt>`__
 for full license terms.
 
+
 .. |cclib| replace:: ``cclib``
 
-.. _cclib: https://github.com/cclib/cclib
+.. _cclib: https://github.com/cclib/cclib
diff --git a/doc/Makefile b/doc/Makefile
@@ -14,7 +14,11 @@ help:
 
 .PHONY: help Makefile
 
+# sphinx-autobuild target
+livehtml:
+	sphinx-autobuild  "$(SOURCEDIR)" "$(BUILDDIR)/html" $(SPHINXOPTS) $(O)
+
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/doc/make.bat b/doc/make.bat
@@ -26,7 +26,11 @@ if errorlevel 9009 (
 	exit /b 1
 )
 
-%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %2
+if "%1" == "livehtml" (
+	sphinx-autobuild %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %2
+) else (
+	%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %2
+)
 goto end
 
 :help

diff --git a/doc/source/_static/.pin b/doc/source/_static/.pin
diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -0,0 +1,33 @@
+.. Dump of an API page, until it gets cleaned up
+
+API (draft page)
+================
+
+Unstructured API dump, to provide cross-reference targets
+for other portions of the docs.
+
+Any of the objects/attributes/methods documented here may
+become private implementation details in future
+versions of ``pent``.
+
+
+.. automodule:: pent.parser
+    :members:
+
+.. automodule:: pent.token
+    :members:
+
+.. automodule:: pent.patterns
+    :members:
+
+.. automodule:: pent.enums
+    :members:
+
+.. automodule:: pent.errors
+    :members:
+
+.. automodule:: pent.thrulist
+    :members:
+
+.. automodule:: pent.utils
+    :members: