Skip to content

2.19.0

Latest
Compare
Choose a tag to compare
@albertvillanova albertvillanova released this 19 Apr 08:46
· 7 commits to main since this release
0d3c746

Dataset Features

  • Add Polars compatibility by @psmyth94 in #6531
    • convert to a Polars dataframe using .to_polars();
      import polars as pl
      from datasets import load_dataset
      ds = load_dataset("DIBT/10k_prompts_ranked", split="train")
      ds.to_polars() \
          .groupby("topic") \
          .agg(pl.len(), pl.first()) \
          .sort("len", descending=True)
    • Use Polars formatting to return Polars objects when accessing a dataset:
      ds = ds.with_format("polars")
      ds[:10].group_by("kind").len()
  • Add fsspec support for to_json, to_csv, and to_parquet by @alvarobartt in #6096
    • Save on HF in any file format:
      ds.to_json("hf://datasets/username/my_json_dataset/data.jsonl")
      ds.to_csv("hf://datasets/username/my_csv_dataset/data.csv")
      ds.to_parquet("hf://datasets/username/my_parquet_dataset/data.parquet")
  • Add mode parameter to Image feature by @mariosasko in #6735
    • Set images to be read in a certain mode like "RGB"
      dataset = dataset.cast_column("image", Image(mode="RGB"))
  • Add CLI function to convert script-dataset to Parquet by @albertvillanova in #6795
    • run command to open a PR in script-based dataset to convert it to Parquet:
      datasets-cli convert_to_parquet <dataset_id>
      
  • Add Dataset.take and Dataset.skip by @lhoestq in #6813
    • same as IterableDataset.take and IterableDataset.skip
      ds = ds.take(10)  # take only the first 10 examples

General improvements and bug fixes

New Contributors

Full Changelog: 2.18.0...2.19.0