Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for yaml databags #912

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

e8johan
Copy link

@e8johan e8johan commented Jun 5, 2021

Description of Changes

  • Wrote at least one-line docstrings (for any new functions)
  • Added unit test(s) covering the changes (if testable)
  • Included a screenshot or animation (if affecting the UI, see Licecap)
  • Link to corresponding documentation pull request for getlektor.com

Added support for yaml databags, and added a test case for databags in general (testing ini, json, and yaml).

Documentation update: lektor/lektor-website#321

The databag test case checks so that a single text value can be read
from databags of the supported formats (ini, json, and yaml).
lektor/databags.py Outdated Show resolved Hide resolved
@@ -2,6 +2,7 @@
import json
import os
from collections import OrderedDict
import yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add PyYAML to install_requires.

(Also, should install/run pre-commit, which, via reorder-python-imports will, err, reorder these imports.)

def load_databag(filename):
try:
if filename.endswith(".json"):
with open(filename, "r") as f:
return json.load(f, object_pairs_hook=OrderedDict)
elif filename.endswith(".ini"):
return decode_flat_data(IniFile(filename).items(), dict_cls=OrderedDict)
elif filename.endswith(".yaml"):
with open(filename, "r") as f:
return ordered_load(f, yaml.SafeLoader)
else:
return None
except (OSError, IOError) as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't something touched by your PR, but might as well clean it up here...

This could be more cleanly written:

except FileNotFoundError:
    return None

# By: https://stackoverflow.com/users/650222/coldfix
# License: CC-BY-SA 4.0 (https://stackoverflow.com/help/licensing)
# URL: https://stackoverflow.com/questions/5121931/in-python-how-can-you-load-yaml-mappings-as-ordereddicts/
def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest doing away with ordered_load and constructing the OrderedLoader class just once. E.g, at top level:

class _OrderedLoader(yaml.SafeLoader):
    def construct_ordered_dict(self, node):
        self.flatten_mapping(node)
        return OrderedDict(self.construct_pairs(node))

_OrderedLoader.add_constructor(
    yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
    _OrderedLoader.construct_ordered_dict,
)

Then, in load_databag (below), call yaml.load(stream, _OrderedLoader) directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this ordered_load hack is necessary at all (the same is written in the linked stackoverflow question). All dicts are ordered since CPython 3.6 (implementation-wise) or Python 3.7 (language spec).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All dicts are ordered since CPython 3.6 (implementation-wise) or Python 3.7 (language spec).

True, but there are still subtle differences, e.g. equality comparison for OrderedDict is order sensitive, while for dict it is not, reversed() iteration does not work for dict before python3.8 (but works for OrderedDict). In general, using OrderedDict seems more explicit when one really does care about order.

Currently, our load_databag is careful to parse to OrderedDicts for JSON and INI databags.

I guess I'd be happy either way, so long as we have tests that verify that order is preserved.

def load_databag(filename):
try:
if filename.endswith(".json"):
with open(filename, "r") as f:
return json.load(f, object_pairs_hook=OrderedDict)
elif filename.endswith(".ini"):
return decode_flat_data(IniFile(filename).items(), dict_cls=OrderedDict)
elif filename.endswith(".yaml"):
with open(filename, "r") as f:
return ordered_load(f, yaml.SafeLoader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per suggestion in previous comment, this becomes

return yaml.load(stream, _OrderedLoader)

from lektor.environment import Environment
from lektor.project import Project


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the tests, but I would suggest these tests be converted to unit tests.

Test the data loading functionality with tests that call lektor.databags.load_databag directly.

If feeling ambitious, test the methods of the Databags class by constructing a Databags instance and calling the methods directly.

This makes it easier to check for specific issues (e.g., is the parsing to OrderedDicts working correctly?) It allows for more tests (to check various aspects of loading) while keeping total test run time low.

If you want to include an integration test to ensure that the bag() works in a template, that's okay. My personal feeling is that is the lowest priority of the tests — if that's broken, someone is likely to notice, even without the test in place.

from lektor.project import Project


def get_databag_builder(tmpdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a pytest style note, this could be structured as a fixture.

@pytest.fixture
def databag_pad():
    project = Project.from_path(Path(__file__).parent / "databag-project")
    return project.make_env(load_plugins=False).new_pad()

@pytest.fixture
def databag_builder(databag_pad, tmp_path):
    return Builder(databag_pad, tmp_path / "output")

def test_databag_project_folder(databag_pad, databag_builder):
    ...

Or, perhaps, just override the project fixture (defined in conftest.py) which would allow for the direct use of the pad and builder fixtures from conftest.py:

@pytest.fixture
def project():
    return Project.from_path(Path(__file__).parent / "databag-project")

def test_databag_project_folder(pad, builder):
    ...

@nixjdm
Copy link
Member

nixjdm commented Mar 3, 2022

If it's too much work to see this PR through the finish line, I think you can disregard this for now:

Having dealt with some very large amounts of yaml before, I've come to really prefer people not use the pyyaml lib, in favor of ruamel.yaml. The primary reason being that pyyaml is not yaml 1.2 compliant, which was published all the way back in 2009, and I've tripped on that and seen others trip on it many times. They aren't wildly different, but it's enough for me to prefer ruamel. Databags generally won't be very complicated, but if it's all the same, I'd rather help people avoid the pitfalls of yaml 1.1.

Secondarily, ruamel supports round-tripping yaml in a way that preserves comments, though I'm not sure that would matter here. At least it doesn't right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants