-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #55 from octue/release/0.1.7
Release/0.1.7
- Loading branch information
Showing
33 changed files
with
1,483 additions
and
347 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
.. _analysis_objects: | ||
|
||
================ | ||
Analysis objects | ||
================ | ||
|
||
An ``Analysis`` object is the sole argument to the ``app`` function in your ``app.py`` module. Its attributes include | ||
every strand that can be possibly added to a ``Twine``, although only the strands specified in your ``twine.py`` file | ||
will not be ``None``. The attributes are: | ||
|
||
- ``input_values`` | ||
- ``input_manifest`` | ||
- ``configuration_values`` | ||
- ``configuration_manifest`` | ||
- ``output_values`` | ||
- ``output_manifest`` | ||
- ``credentials`` | ||
- ``children`` | ||
- ``monitors`` | ||
|
||
Additionally, all input and configuration attributes are hashed using a | ||
`BLAKE3 hash <https://github.com/BLAKE3-team/BLAKE3>`_ so the inputs and configuration that produced a given output in | ||
your app can always be verified. These hashes exist on the following attributes: | ||
|
||
- ``input_values_hash`` | ||
- ``input_manifest_hash`` | ||
- ``configuration_values_hash`` | ||
- ``configuration_manifest_hash`` | ||
|
||
If an input or configuration attribute is ``None``, so will its hash attribute be. For ``Manifests``, some metadata | ||
about the ``Datafiles`` and ``Datasets`` within them, and about the ``Manifest`` itself, is included when calculating | ||
the hash: | ||
|
||
- For a ``Datafile``, the content of its on-disk file is hashed, along with the following metadata: | ||
|
||
- ``name`` | ||
- ``cluster`` | ||
- ``sequence`` | ||
- ``posix_timestamp`` | ||
- ``tags`` | ||
|
||
- For a ``Dataset``, the hashes of its ``Datafiles`` are included, along with its ``tags``. | ||
|
||
- For a ``Manifest``, the hashes of its ``Datasets`` are included, along with its ``keys``. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
.. _datafile: | ||
|
||
======== | ||
Datafile | ||
======== | ||
|
||
A ``Datafile`` is an Octue type that corresponds to a file, which may exist on your computer or in a cloud store. It has | ||
the following main attributes: | ||
|
||
- ``path`` - the path of this file, which may include folders or subfolders, within the dataset. | ||
- ``cluster`` - the integer cluster of files, within a dataset, to which this belongs (default 0) | ||
- ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate) | ||
- ``tags`` - a space-separated string or iterable of tags relevant to this file | ||
- ``posix_timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
.. _dataset: | ||
|
||
======= | ||
Dataset | ||
======= | ||
|
||
A ``Dataset`` contains any number of ``Datafiles`` along with the following metadata: | ||
|
||
- ``name`` | ||
- ``tags`` | ||
|
||
The files are stored in a ``FilterSet``, meaning they can be easily filtered according to any attribute of the | ||
`Datafile <datafile.rst>`_ instances it contains. | ||
|
||
|
||
-------------------------------- | ||
Filtering files in a ``Dataset`` | ||
-------------------------------- | ||
|
||
You can filter a ``Dataset``'s files as follows: | ||
|
||
.. code-block:: python | ||
dataset = Dataset( | ||
files=[ | ||
Datafile(path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"), | ||
Datafile(path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"), | ||
Datafile(path="path-within-dataset/another_file.csv", tags="three all"), | ||
] | ||
) | ||
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv") | ||
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})> | ||
dataset.files.filter("tags__contains", filter_value="a:2") | ||
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})> | ||
You can also chain filters indefinitely: | ||
|
||
.. code-block:: python | ||
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2") | ||
>>> <FilterSet({<Datafile('my_file.csv')>})> | ||
Find out more about ``FilterSets`` `here <filterset.rst>`_, including all the possible filters available for each type of object stored on | ||
an attribute of a ``FilterSet`` member, and how to convert them to primitive types such as ``set`` or ``list``. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
.. _filter_containers: | ||
|
||
================= | ||
Filter containers | ||
================= | ||
|
||
A filter container is just a regular python container that has some extra methods for filtering or ordering its | ||
elements. It has the same interface (i.e. attributes and methods) as the primitive python type it inherits from, with | ||
these extra methods: | ||
|
||
- ``filter`` | ||
- ``order_by`` | ||
|
||
There are two types of filter containers currently implemented: | ||
|
||
- ``FilterSet`` | ||
- ``FilterList`` | ||
|
||
``FilterSets`` are currently used in: | ||
|
||
- ``Dataset.files`` to store ``Datafiles`` | ||
- ``TagSet.tags`` to store ``Tags`` | ||
|
||
You can see filtering in action on the files of a ``Dataset`` `here <dataset.rst>`_. | ||
|
||
|
||
--------- | ||
Filtering | ||
--------- | ||
|
||
Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``, and any attribute of a member of the | ||
``FilterSet`` whose type or interface is supported can be filtered. | ||
.. code-block:: python | ||
filter_set = FilterSet( | ||
{Datafile(path="my_file.csv"), Datafile(path="your_file.txt"), Datafile(path="another_file.csv")} | ||
) | ||
filter_set.filter(filter_name="name__ends_with", filter_value=".csv") | ||
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})> | ||
The following filters are implemented for the following types: | ||
|
||
- ``bool``: | ||
|
||
* ``is`` | ||
* ``is_not`` | ||
|
||
- ``str``: | ||
|
||
* ``is`` | ||
* ``is_not`` | ||
* ``equals`` | ||
* ``not_equals`` | ||
* ``iequals`` | ||
* ``not_iequals`` | ||
* ``lt`` (less than) | ||
* ``lte`` (less than or equal) | ||
* ``gt`` (greater than) | ||
* ``gte`` (greater than or equal) | ||
* ``contains`` | ||
* ``not_contains`` | ||
* ``icontains`` (case-insensitive contains) | ||
* ``not_icontains`` | ||
* ``starts_with`` | ||
* ``not_starts_with`` | ||
* ``ends_with`` | ||
* ``not_ends_with`` | ||
|
||
- ``NoneType``: | ||
|
||
* ``is`` | ||
* ``is_not`` | ||
|
||
- ``TagSet``: | ||
|
||
* ``is`` | ||
* ``is_not`` | ||
* ``equals`` | ||
* ``not_equals`` | ||
* ``any_tag_contains`` | ||
* ``not_any_tag_contains`` | ||
* ``any_tag_starts_with`` | ||
* ``not_any_tag_starts_with`` | ||
* ``any_tag_ends_with`` | ||
* ``not_any_tag_ends_with`` | ||
|
||
|
||
|
||
Additionally, these filters are defined for the following *interfaces* (duck-types). : | ||
|
||
- Numbers: | ||
|
||
* ``is`` | ||
* ``is_not`` | ||
* ``equals`` | ||
* ``not_equals`` | ||
* ``lt`` | ||
* ``lte`` | ||
* ``gt`` | ||
* ``gte`` | ||
|
||
- Iterables: | ||
|
||
* ``is`` | ||
* ``is_not`` | ||
* ``equals`` | ||
* ``not_equals`` | ||
* ``contains`` | ||
* ``not_contains`` | ||
* ``icontains`` | ||
* ``not_icontains`` | ||
|
||
The interface filters are only used if the type of the attribute of the element being filtered is not found in the first | ||
list of filters. | ||
|
||
-------- | ||
Ordering | ||
-------- | ||
As sets are inherently orderless, ordering a ``FilterSet`` results in a new ``FilterList``, which has the same extra | ||
methods and behaviour as a ``FilterSet``, but is based on the ``list`` type instead - meaning it can be ordered and | ||
indexed etc. A ``FilterSet`` or ``FilterList`` can be ordered by any of the attributes of its members: | ||
.. code-block:: python | ||
filter_set.order_by("name") | ||
>>> <FilterList([<Datafile('another_file.csv')>, <Datafile('my_file.csv')>, <Datafile(path="your_file.txt")>])> | ||
The ordering can also be carried out in reverse (i.e. descending order) by passing ``reverse=True`` as a second argument | ||
to the ``order_by`` method. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
from .base import MixinBase | ||
from .filterable import Filterable | ||
from .hashable import Hashable | ||
from .identifiable import Identifiable | ||
from .loggable import Loggable | ||
from .pathable import Pathable | ||
from .serialisable import Serialisable | ||
from .taggable import Taggable | ||
|
||
|
||
__all__ = "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable" | ||
__all__ = ("Filterable", "Hashable", "Identifiable", "Loggable", "MixinBase", "Pathable", "Serialisable", "Taggable") |
Oops, something went wrong.