Skip to content

Releases: tensorflow/datasets

v4.9.4

18 Dec 13:28
Compare
Choose a tag to compare

Added

  • A new CroissantBuilder
    which initializes a DatasetBuilder based on a Croissant
    metadata file.
  • New conversion options between different bounding boxes formats.
  • Better support for HuggingfaceDatasetBuilder.
  • A script
    to convert a dataset from one format to another.

Changed

Deprecated

  • Python 3.9 support. TFDS now uses Python 3.10

Removed

Fixed

Security

v4.9.3

08 Sep 09:07
Compare
Choose a tag to compare

Added

Changed

  • Hugging Face datasets accept None values for any features. TFDS has no
    tfds.features.Optional, so None values are converted to default values.
    Those default values used to be 0 and 0.0 for int and float. Now, it's
    -inf as defined by NumPy (e.g., np.iinfo(np.int32).min or
    np.finfo(np.float32).min). This avoids ambiguous values when 0 and 0.0
    exist in the values of the dataset. The roadmap is to implement
    tfds.features.Optional.

Deprecated

  • Python 3.8 support. As per
    NEP 29, TFDS now
    uses Python>=3.9.

Removed

Fixed

Security

v4.9.2

13 Apr 11:21
Compare
Choose a tag to compare

Added

  • [Experimental] A list of freeform text tags can now be attached to a
    BuilderConfig. For example:
    BUILDER_CONFIGS = [
        tfds.core.BuilderConfig(name="foo", tags=["foo", "live"]),
        tfds.core.BuilderConfig(name="bar", tags=["bar", "old"]),
    ]
    The tags are recorded with the dataset metadata and can later be retrieved
    using the info object:
    builder.info.config_tags  # ["foo", "live"]
    This feature is experimental and there are no guidelines on tags format.

Changed

Deprecated

Removed

Fixed

  • Fixed generated proto files (see issue 4858).

Security

v4.9.1

11 Apr 13:16
Compare
Choose a tag to compare

Added

Changed

Deprecated

Removed

Fixed

  • The installation on macOS now works (see issues
    4805 and
    4852). The ArrayRecord
    dependency is lazily loaded, so the
    TensorFlow-less path is
    not possible at the moment on macOS. A fix for this will follow soon.

Security

v4.9.0

05 Apr 07:30
Compare
Choose a tag to compare

Added

Changed

  • Support for tensorflow=2.12.

Deprecated

Removed

Fixed

Security

v4.8.3

27 Feb 11:46
Compare
Choose a tag to compare

Added

Changed

Deprecated

  • Python 3.7 support: this version and future version use Python 3.8.

Removed

Fixed

  • Flag ignore_verifications from Hugging Face's datasets.load_dataset is
    deprecated, and used to cause errors in tfds.load(huggingface:foo).

Security

v4.8.2

17 Jan 20:41
Compare
Choose a tag to compare

Deprecated

  • Python 3.7 support: this is the last version of TFDS supporting Python 3.7.
    Future versions will use Python 3.8.

Fixed

  • tfds new and tfds build better support the new recommended datasets
    organization, where individual datasets have their own package under
    datasets/, builder class is called Builder and is defined within module
    ${dsname}_dataset_builder.py.

Security

v4.8.1

02 Jan 18:30
Compare
Choose a tag to compare

Changed

  • Added file valid_tags.txt to not break builds.
  • TFDS no longer relies on TensorFlow DTypes. We chose NumPy DTypes to keep the
    typing expressiveness, while dropping the heavy dependency on TensorFlow. We
    migrated all our internal datasets. Please, migrate accordingly:
    • tf.bool: np.bool_
    • tf.string: np.str_
    • tf.int64, tf.int32, etc: np.int64, np.int32, etc
    • tf.float64, tf.float32, etc: np.float64, np.float32, etc

v4.8.0

21 Dec 11:09
Compare
Choose a tag to compare

Added

  • [API] DatasetBuilder's description and citations can be specified in
    dedicated README.md and CITATIONS.bib files, within the dataset package
    (see https://www.tensorflow.org/datasets/add_dataset).
  • Tags can be associated to Datasets, in the TAGS.txt file. For
    now, they are only used in the generated documentation.
  • [API][Experimental] New ViewBuilder to define datasets as transformations
    of existing datasets. Also adds tfds.transform with functionality to apply
    transformations.
  • Loggers are also called on tfds.as_numpy(...), base Logger class has a
    new corresponding method.
  • tfds.core.DatasetBuilder can have a default limit for the number of
    simultaneous downloads. tfds.download.DownloadConfig can override it.
  • tfds.features.Audio supports storing raw audio data for lazy decoding.
  • The number of shards can be overridden when preparing a dataset:
    builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42)).
    Alternatively, you can configure the min and max shard size if you want TFDS
    to compute the number of shards for you, but want to have control over the
    shard sizes.

Changed

Deprecated

Removed

Fixed

Security

v4.7.0

05 Oct 10:23
f00f1e3
Compare
Choose a tag to compare

Added

  • [API] Added TfDataBuilder that is handy for storing experimental ad hoc TFDS datasets in notebook-like environments such that they can be versioned, described, and easily shared with teammates.
  • [API] Added options to create format-specific dataset builders. The new API now includes a number of NLP-specific builders, such as:
  • [API] Added tfds.beam.inc_counter to reduce beam.metrics.Metrics.counter boilerplate
  • [API] Added options to group together existing TFDS datasets into dataset collections and to perform simple operations over them.
  • [Documentation] update, specifically:
    • New guide on format-specific dataset builders;
    • New guide on adding new dataset collections to TFDS;
    • Updated TFDS CLI documentation.
  • [TFDS CLI] Supports custom config through Json (e.g. tfds build my_dataset --config='{"name": "my_custom_config", "description": "Abc"}')
  • New datasets:
  • Updated datasets:
    • C4 was updated to version 3.1.
    • common_voice was updated to a more recent snapshot.
    • wikipedia was updated with the 20220620 snapshot.
  • New dataset collections, such as xtreme and LongT5

Changed

  • The base Logger class expects more information to be passed to the as_dataset method. This should only be relevant to people who have implemented and registered custom Logger class(es).
  • You can set DEFAULT_BUILDER_CONFIG_NAME in a DatasetBuilder to change the default config if it shouldn't be the first builder config defined in BUILDER_CONFIGS.

Deprecated

Removed

Fixed

  • Various datasets
  • In Linux, when loading a dataset from a directory that is not your home (~) directory, a new ~ directory is not created in the current directory (fixes #4117).

Security