Skip to content

0.12.0

Compare
Choose a tag to compare
@mrchtr mrchtr released this 17 Apr 13:15
· 2 commits to main since this release
c41080a

⚡️ Introducing the dataset-first interface

We have removed the pipeline interface and redesigned the dataset class. Datasets can still be built using load components as before. Now, you have to use the Dataset class instead of the Pipeline.

from fondant.dataset import Dataset

dataset = Dataset.create(
    "load_from_parquet",
    arguments={
        ...
    },
)

dataset = dataset.apply(...)

Additionally, we now support initializing datasets from previous workflow runs, which allows you to share your Fondant datasets. Datasets can be initialized using manifests. To share a dataset, you can easily share manifest files.

from fondant.dataset import Dataset

dataset = Dataset.read("gs://.../manifest.json")
dataset = dataset.apply(...)

🛠️ Working directory

Since the pipeline doesn’t exist anymore, we added a new cli command to define a working directory. In the working directory all the workflow related artifacts will be stored.

fondant run local dataset --working-directory ./data

⚠️ Attention:
Fondant pipelines created with previous Fondant versions are no longer compatible with >=0.12.0. To migrate your existing pipelines, initialize your dataset using Dataset.create(...) instead of Pipeline.read(...) and use the former base_path as the working directory when you materialize your dataset.

What's Changed

New Contributors

Full Changelog: 0.11.2...0.12.0