Skip to content

Commit

Permalink
Merge pull request #55 from octue/release/0.1.7
Browse files Browse the repository at this point in the history
Release/0.1.7
  • Loading branch information
thclark committed Jan 5, 2021
2 parents 0dd42ec + c2f6ff7 commit 5bcf2b7
Show file tree
Hide file tree
Showing 33 changed files with 1,483 additions and 347 deletions.
9 changes: 0 additions & 9 deletions .github/workflows/check-version-consistency.yml

This file was deleted.

8 changes: 8 additions & 0 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ name: python-ci
on: [push]

jobs:

check-version-consistency:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- run: python .github/workflows/scripts/check-version-consistency.py

tests:
runs-on: ubuntu-latest
env:
Expand Down
44 changes: 44 additions & 0 deletions docs/source/analysis_objects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. _analysis_objects:

================
Analysis objects
================

An ``Analysis`` object is the sole argument to the ``app`` function in your ``app.py`` module. Its attributes include
every strand that can be possibly added to a ``Twine``, although only the strands specified in your ``twine.py`` file
will not be ``None``. The attributes are:

- ``input_values``
- ``input_manifest``
- ``configuration_values``
- ``configuration_manifest``
- ``output_values``
- ``output_manifest``
- ``credentials``
- ``children``
- ``monitors``

Additionally, all input and configuration attributes are hashed using a
`BLAKE3 hash <https://github.com/BLAKE3-team/BLAKE3>`_ so the inputs and configuration that produced a given output in
your app can always be verified. These hashes exist on the following attributes:

- ``input_values_hash``
- ``input_manifest_hash``
- ``configuration_values_hash``
- ``configuration_manifest_hash``

If an input or configuration attribute is ``None``, so will its hash attribute be. For ``Manifests``, some metadata
about the ``Datafiles`` and ``Datasets`` within them, and about the ``Manifest`` itself, is included when calculating
the hash:

- For a ``Datafile``, the content of its on-disk file is hashed, along with the following metadata:

- ``name``
- ``cluster``
- ``sequence``
- ``posix_timestamp``
- ``tags``

- For a ``Dataset``, the hashes of its ``Datafiles`` are included, along with its ``tags``.

- For a ``Manifest``, the hashes of its ``Datasets`` are included, along with its ``keys``.
14 changes: 14 additions & 0 deletions docs/source/datafile.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _datafile:

========
Datafile
========

A ``Datafile`` is an Octue type that corresponds to a file, which may exist on your computer or in a cloud store. It has
the following main attributes:

- ``path`` - the path of this file, which may include folders or subfolders, within the dataset.
- ``cluster`` - the integer cluster of files, within a dataset, to which this belongs (default 0)
- ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate)
- ``tags`` - a space-separated string or iterable of tags relevant to this file
- ``posix_timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data
44 changes: 44 additions & 0 deletions docs/source/dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. _dataset:

=======
Dataset
=======

A ``Dataset`` contains any number of ``Datafiles`` along with the following metadata:

- ``name``
- ``tags``

The files are stored in a ``FilterSet``, meaning they can be easily filtered according to any attribute of the
`Datafile <datafile.rst>`_ instances it contains.


--------------------------------
Filtering files in a ``Dataset``
--------------------------------

You can filter a ``Dataset``'s files as follows:

.. code-block:: python
dataset = Dataset(
files=[
Datafile(path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"),
Datafile(path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"),
Datafile(path="path-within-dataset/another_file.csv", tags="three all"),
]
)
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
dataset.files.filter("tags__contains", filter_value="a:2")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>
You can also chain filters indefinitely:

.. code-block:: python
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2")
>>> <FilterSet({<Datafile('my_file.csv')>})>
Find out more about ``FilterSets`` `here <filterset.rst>`_, including all the possible filters available for each type of object stored on
an attribute of a ``FilterSet`` member, and how to convert them to primitive types such as ``set`` or ``list``.
127 changes: 127 additions & 0 deletions docs/source/filter_containers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
.. _filter_containers:

=================
Filter containers
=================

A filter container is just a regular python container that has some extra methods for filtering or ordering its
elements. It has the same interface (i.e. attributes and methods) as the primitive python type it inherits from, with
these extra methods:

- ``filter``
- ``order_by``

There are two types of filter containers currently implemented:

- ``FilterSet``
- ``FilterList``

``FilterSets`` are currently used in:

- ``Dataset.files`` to store ``Datafiles``
- ``TagSet.tags`` to store ``Tags``

You can see filtering in action on the files of a ``Dataset`` `here <dataset.rst>`_.


---------
Filtering
---------

Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``, and any attribute of a member of the
``FilterSet`` whose type or interface is supported can be filtered.
.. code-block:: python
filter_set = FilterSet(
{Datafile(path="my_file.csv"), Datafile(path="your_file.txt"), Datafile(path="another_file.csv")}
)
filter_set.filter(filter_name="name__ends_with", filter_value=".csv")
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
The following filters are implemented for the following types:

- ``bool``:

* ``is``
* ``is_not``

- ``str``:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``iequals``
* ``not_iequals``
* ``lt`` (less than)
* ``lte`` (less than or equal)
* ``gt`` (greater than)
* ``gte`` (greater than or equal)
* ``contains``
* ``not_contains``
* ``icontains`` (case-insensitive contains)
* ``not_icontains``
* ``starts_with``
* ``not_starts_with``
* ``ends_with``
* ``not_ends_with``

- ``NoneType``:

* ``is``
* ``is_not``

- ``TagSet``:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``any_tag_contains``
* ``not_any_tag_contains``
* ``any_tag_starts_with``
* ``not_any_tag_starts_with``
* ``any_tag_ends_with``
* ``not_any_tag_ends_with``



Additionally, these filters are defined for the following *interfaces* (duck-types). :

- Numbers:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``lt``
* ``lte``
* ``gt``
* ``gte``

- Iterables:

* ``is``
* ``is_not``
* ``equals``
* ``not_equals``
* ``contains``
* ``not_contains``
* ``icontains``
* ``not_icontains``

The interface filters are only used if the type of the attribute of the element being filtered is not found in the first
list of filters.

--------
Ordering
--------
As sets are inherently orderless, ordering a ``FilterSet`` results in a new ``FilterList``, which has the same extra
methods and behaviour as a ``FilterSet``, but is based on the ``list`` type instead - meaning it can be ordered and
indexed etc. A ``FilterSet`` or ``FilterList`` can be ordered by any of the attributes of its members:
.. code-block:: python
filter_set.order_by("name")
>>> <FilterList([<Datafile('another_file.csv')>, <Datafile('my_file.csv')>, <Datafile(path="your_file.txt")>])>
The ordering can also be carried out in reverse (i.e. descending order) by passing ``reverse=True`` as a second argument
to the ``order_by`` method.
4 changes: 4 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Not all of Octue's API functionality is implemented in the SDK yet, we're active
:hidden:

installation
datafile
dataset
filter_containers
analysis_objects
license
version_history
bibliography
Expand Down
4 changes: 3 additions & 1 deletion octue/mixins/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from .base import MixinBase
from .filterable import Filterable
from .hashable import Hashable
from .identifiable import Identifiable
from .loggable import Loggable
from .pathable import Pathable
from .serialisable import Serialisable
from .taggable import Taggable


__all__ = "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable"
__all__ = ("Filterable", "Hashable", "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable")

0 comments on commit 5bcf2b7

Please sign in to comment.