Skip to content

Releases: ibis-project/ibis

9.0.0

30 Apr 18:01
Compare
Choose a tag to compare

9.0.0 (2024-04-30)

⚠ BREAKING CHANGES

  • udf: The schema parameter for UDF definition has been removed. A new catalog parameter has been added. Ibis uses the word database to refer to a collection of tables, and the word catalog to refer to a collection of databases. You can use a combination of catalog and database to specify a hierarchical location for the UDF.
  • pyspark: Arguments to create_database, drop_database, and get_schema are now keyword-only except for the name args. Calls to these functions that have relied on positional argument ordering need to be updated.
  • dask: the dask backend no longer supports cov/corr with how="pop".
  • duckdb: Calling the get or contains method on NULL map
    values now returns NULL. Use coalesce(map.get(...), default) or
    coalesce(map.contains(), False) to get the previous behavior.
  • api: Integer inputs to select and mutate are now always interpreted as literals. Columns can still be accessed by their integer index using square-bracket syntax.
  • api: strings passed to table.mutate() are now interpreted as
    column references instead of literals, use ibis.literal(string) to
    pass the string as a literal
  • ir: Schema.apply_to() is removed, use ibis.formats.pandas.PandasConverter.convert_frame() instead
  • ddl: We are removing the word schema in its hierarchical
    sense. We use database to mean a collection of tables. The behavior of
    all *_database methods now applies only to collections of tables and
    never to collections of database (formerly schema)
  • CanListDatabases abstract methods now all refer to
    collections of tables.
  • CanCreateDatabases abstract methods now all refer to
    collections of tables.
  • list_databases now takes a kwarg catalog.
  • create_database now takes a kwarg catalog.
  • drop_database now takes a kwarg catalog.
  • current_database now refers to the current collection of tables.
  • CanCreateSchema is deprecated and create_schema, drop_schema,
    list_schemas, and current_schema are deprecated and redirect to the
    corresponding method/property ending in database.
  • We add a CanListCatalog and CanCreateCatalog that can list and
    create collections of database, respectively.
    The new methods are list_catalogs, create_catalog, drop_catalog,
  • There is a new current_catalog property.
  • api: timecontext feature is removed
  • api: The by argument from asof_join is removed. Calls to asof_join that previously used by should pass those arguments to predicates instead.
  • cleanup: Deprecated methods and properties op, output_dtype, and output_shape are removed. op is no longer needed, and use .dtype and .shape respectively for the other two.
  • api: expr.topk(...) now includes null counts. The row count of the topk call will not differ, but the number of nulls counted will no longer be zero. To drop the null row use the dropna method.
  • api: ibis.rows_with_max_lookback() function and ibis.window(max_lookback) argument are removed
  • strings: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
  • impala: Impala UDFs no longer require explicit registration. Remove any calls to Function.register. If you were passing database to Function.register, pass that to scalar_function or aggregate_function as appropriate.
  • pandas: the timecontext feature is not supported anymore
  • api: on paremater of table.asof_join() is now only
    accept a single predicate, use predicates to supply additional
    join predicates.

Features

  • add to_date function to StringValue (#9030) (0701978), closes #8908
  • api: add .as_scalar() method for turning expressions into scalar subqueries (#8350) (8130169)
  • api: add catalog and database kwargs to ibis.table (#8801) (7d593c4)
  • api: add describe method to compute summary stats of table expressions (#8739) (c8d98a1)
  • api: add ibis.today() for retrieving the current date (#8664) (5e10d17)
  • api: add a to_polars() method for returning query results as polars objects (53454c1)
  • api: add a uuid function for returning a new uuid (#8438) (965b6d9)
  • api: add API for unwrapping JSON values into backend-native values (#8958) (aebb5cf)
  • api: add disconnect method (#8341) (32665af), closes #5940
  • api: allow *arg syntax with GroupedTable methods (#8923) (489bb89)
  • api: count nulls with topk (#8531) (54c2c70)
  • api: expose common types in the top-level ibis namespace (#9008) (3f3ed27), closes #8717
  • api: include bad type in NotImplementedError (#8291) (36da06b)
  • api: natively support polars dataframes in ibis.memtable (464bebc)
  • api: support Table.order_by(*keys) (6ade4e9)
  • api: support all dtypes in MapGet and MapContains (#8648) (401e0a4)
  • api: support converting ibis types & schemas to/from polars types & schemas (73add93)
  • api: support Deferreds in Array.map and .filter (#8267) (8289d2c)
  • api: support the inner join convenience to not repeat fields known to be equal (#8127) (798088d)
  • api: support variadic arguments on Table.group_by() (#8546) (665bc4f)
  • backends: introducing ibish the infinite scale backend you always wanted (#8785) (1d51243)
  • bigquery: support polars memtables (26d103d)
  • common: add Dispatched base class for convenient visitor pattern implementation (f80c5b3)
  • common: add Node.find_below() methods to exclude the root node from filtering (#8861) (80d12a2)
  • common: add a memory efficient Node.map() implementation (e3f2217)
  • common: also traverse nodes used as dictionary keys (#9041) (02c6607)
  • common: introduce FrozenOrderedDict (#9081) (f926995), closes #9063
  • datafusion, flink, mssql: add uuid operation (#8545) (2f85a42)
  • datafusion: add array and strings functions ([#...
Read more

8.0.0

05 Feb 19:31
Compare
Choose a tag to compare

8.0.0 (2024-02-05)

⚠ BREAKING CHANGES

  • backends: Columns with Ibis date types are now returned as object dtype containing datetime.date objects when executing with the pandas backend.
  • impala: Direct HDFS integration is removed and support for ingesting pandas DataFrames directly is as well. The Impala backend still works with HDFS, but data in HDFS must be managed outside of ibis.
  • api: replace ibis.show_sql(expr) calls with print(ibis.to_sql(expr)) or if using Jupyter or IPython ibis.to_sql(expr)
  • bigquery: nullifzero is removed; use nullif(0) instead
  • bigquery: zeroifnull is removed; use fillna(0) instead
  • bigquery: list_databases is removed; use list_schemas instead
  • bigquery: the bigquery current_database method returns the data_project instead of the dataset_id. Use current_schema to retrieve dataset_id. To explicitly list tables in a given project and dataset, you can use f"{con.current_database}.{con.current_schema}"

Features

  • api: define RegexSplit operation and re_split API (07beaed)
  • api: support median and quantile on more types (#7810) (49c75a8)
  • clickhouse: implement RegexSplit (e3c507e)
  • datafusion: implement ops.RegexSplit using pyarrow UDF (37b6b7f)
  • datafusion: set ops (37abea9)
  • datatypes: add decimal and basic geospatial support to the sqlglot type parser/generator (59783b9)
  • datatypes: make intervals round trip through sqlglot type mapper (d22f97a)
  • duckdb-geospatial: add support for flipping coordinates (d47088b)
  • duckdb-geospatial: enable use of literals (23ad256)
  • duckdb: implement RegexSplit (229a1f4)
  • examples: add zones geojson example (#8040) (2d562b7), closes #7958
  • flink: add new temporal operators (dfef418)
  • flink: add primary key support (da04679)
  • flink: export result to pyarrow (9566263)
  • flink: implement array operators (#7951) (80e13b4)
  • flink: implement struct field, clean up literal, and adjust timecontext test markers (#7997) (2d5e108)
  • impala: rudimentary date support (d4bcf7b)
  • mssql: add hashbytes and test for binary output hash fns (#8107) (91f60cd), closes #8082 #8082
  • mssql: use odbc (f03ad0c)
  • polars: implement ops.RegexSplit using pyarrow UDF (a3bed10)
  • postgres: implement RegexSplit (c955b6a)
  • pyspark: implement RegexSplit (cfe0329)
  • risingwave: init impl for Risingwave (#7954) (351747a), closes #8038
  • snowflake: implement RegexSplit (2c1a726)
  • snowflake: implement insert method (2162e3f)
  • trino: implement RegexSplit (9d1295f)

Bug Fixes

  • api: deferred values are not truthy (00b3ece)
  • backends: ensure that returned date results are actually proper date values (0626fb2)
  • backends: preserve order_by position in window function when subsequent expressions are duplicated (#7943) (89056b9), closes #7940
  • common: do not convert callables to resolveable objects (9963705)
  • datafusion: work around lack of support for uppercase units in intervals (ebb6cde)
  • datatypes: ensure that array construction supports literals and infers their shape from its inputs (#8049) (899dce1), closes #8022
  • datatypes: fix bad references in to_numpy() (6fd4550)
  • deps: remove filelock from required dependencies (76dded5)
  • deps: update dependency black to v24 (425f7b1)
  • deps: update dependency datafusion to v34 (601f889)
  • deps: update dependency datafusion to v35 (#8224) (a34af25)
  • deps: update dependency oracledb to v2 (e7419ca)
  • deps: update dependency pyarrow to v15 (ef6a9bd)
  • deps: update dependency pyodbc to v5 (32044ea)
  • docs: surround executable code blocks with interactive mode on/off (4c660e0)
  • duckdb: allow table creation from expr with geospatial datatypes (#7818) (ecac322)
  • duckdb: ensure that casting to floating point values produces valid types in generated sql (424b206)
  • examples: use anonymous access when reading example data from GCS (8e5c0af)
  • impala: generate memtables using UNION ALL to work around sqlglot bug (399a5ef)
  • mutate/select: ensure that unsplatted dictionaries work in mutateandselect APIs (#8014) (8ed19ea), closes #8013
  • mysql: catch PyMySQL OperationalError exception (#7919) (f2c2664), closes #6010 #7918
  • pandas: support non-string categorical columns (5de08c7)
  • polars: avoid using unnecessary subquery for schema inference (0f43667)
  • **p...
Read more

7.2.0

18 Dec 23:17
Compare
Choose a tag to compare

7.2.0 (2023-12-18)

Features

  • api: add ArrayValue.flatten method and operation (e6e995c)
  • api: add ibis.range function for generating sequences (f5a0a5a)
  • api: add timestamp range (c567fe0)
  • base: add to_pandas method to BaseBackend (3d1cf66)
  • clickhouse: implement array flatten support (d15c6e6)
  • common: node.replace() now supports mappings for quick lookup-like substitutions (bbc93c7)
  • common: add node.find_topmost() method to locate matching nodes without descending further to their children (15acf7d)
  • common: allow matching on dictionaries in possibly nested patterns (1d314f7)
  • common: expose node.__children__ property to access the flattened list of children of a node (2e91476)
  • duckdb: add initial support for geospatial functions (65f496c)
  • duckdb: add read_geo function (b19a8ce)
  • duckdb: enforce aswkb for projections, coerce to geopandas (33327dc)
  • duckdb: implement array flatten support (0a0eecc)
  • exasol: add exasol backend (295903d)
  • export: allow passing keyword arguments to PyArrow ParquetWriter and CSVWriter (40558fd)
  • flink: implement nested schema support (057fabc)
  • flink: implement windowed computations (256767f)
  • geospatial: add support for GeoTransform on duckdb (ec533c1)
  • geospatial: update read_geo to support url (3baf509)
  • pandas/dask: implement flatten (c2e8d9d)
  • polars: add streaming kwarg to to_pandas (703507f)
  • polars: implement array flatten support (19b2aa0)
  • pyspark: enable multiple values in .substitute (291a290)
  • pyspark: implement array flatten support (5d1fadf)
  • snowflake: implement array flatten support (d3c754f)
  • snowflake: read_csv with https (72752eb)
  • snowflake: support udf arguments for reading from staged files (529a3a2)
  • snowflake: use upstream array_sort (9624341)
  • sqlalchemy: support expressions in window bounds (5dbb3b1)
  • trino: implement array flatten support (0d1faaa)

Bug Fixes

  • api: avoid casting to bool for table.info() nullable column (3b3bd7b)
  • bigquery: escape the schema (project ID) for BQ builtin UDFs (8096552)
  • bigquery: fully qualified memtable names in compile (a81e432)
  • clickhouse: use backwards compatible methods of getting query metadata (975556f)
  • datafusion: bring back UDF registration (43084fa)
  • datafusion: ensure that non-matching re_search calls return bool values when patterns do not match (088b027)
  • datafusion: support computed group by when the aggregation is count distinct (18bdb7e)
  • decompile: handle isin (6857751)
  • deferred: don't pass expression in fstringified error message (724859d)
  • deps: update dependency datafusion to v33 (57047a2)
  • deps: update dependency sqlglot to v20 (13bc6e2)
  • duckdb: ensure that already quoted identifiers are not erased (45ee391)
  • duckdb: ensure that parameter names are unlikely to overlap with column names (d93dbe2)
  • duckdb: gate geoalchemy import in duckdb geospatial (8f012c4)
  • duckdb: render dates, times, timestamps and none literals correctly (5d8866a)
  • duckdb: use functions for temporal literals (b1407f8)
  • duckdb: use the UDF's signature instead of arguments' output type for generating a duckdb signature (233dce1)
  • flink: add more test (33e1a31)
  • flink: add os to the cache key (1b92b33)
  • flink: add test cases for recreate table (1413de9)
  • flink: customize the list of base idenitifers (0b5d343)
  • flink: fix recreating table/view issue on flink backend (0c9791f)
  • flink: implement TypeMapper and SchemaMapper for Flink backend (f983bfa)
  • flink: use lazy import to prevent premature loading of pyflink during gen_matrix (d042402)
  • geospatial: pretty print data in interactive mode (afb04ed)
  • ir: ensure that join projection columns are all always nullable (f5f35c6)
  • ir: handle renaming for scalar operations (6f77f17)
  • ir: handle the case of non-overlapping data and add a test (1c9ae1b)
  • ir: implicitly convert None literals with dt.Null type to the requested type during value coercion (d51ec4e)
  • ir: merge window frames for bound analytic window functions with a subsequent over call (e12ce8d)
  • ir: raise if Concrete.copy() receives unexpected arguments (442199a)
  • memtable: ensure column names match provided data (faf99df)
  • memtables: disallow duplicat...
Read more

7.1.0

16 Nov 19:51
Compare
Choose a tag to compare

7.1.0 (2023-11-16)

Features

  • api: add bucket method for timestamps (ca0f7bc)
  • api: add Table.sample method for sampling rows from a table (3ce2617)
  • api: allow selectors in order_by (359fd5e)
  • api: move analytic window functions to top-level (8f2ced1)
  • api: support deferred in reduction filters (349f475)
  • api: support specifying signature in udf definitions (764977e)
  • bigquery: add location parameter (d652dbb)
  • bigquery: add read_csv, read_json, read_parquet support (ff83110)
  • bigquery: support temporary tables using sessions (eab48a9)
  • clickhouse: add support for timestamp bucket (10a5916)
  • clickhouse: support Table.fillna (5633660)
  • common: better inheritance support for Slotted and FrozenSlotted (9165d41)
  • common: make Slotted and FrozenSlotted pickleable (13cbce0)
  • common: support Self annotations for Annotable (0c60146)
  • common: use patterns to filter out nodes during graph traversal (3edd8f7)
  • dask: add read_csv and read_parquet (e9260af)
  • dask: enable pyarrow conversion (2d36722)
  • dask: support Table.sample (09a7626)
  • datafusion: add case and if-else statements (851d560)
  • datafusion: add corr and covar (edc42be)
  • datafusion: add isnull and isnan operations (0076c25)
  • datafusion: add some array functions (0b96b68)
  • datafusion: add StringLength, FindInSet, ArrayStringJoin (fd03831)
  • datafusion: add TimestampFromUNIX and subtract/add operations (2bffa5a)
  • datafusion: add TimestampTruncate / fix broken extract time part functions (940ed21)
  • datafusion: support dropping schemas (cc6870c)
  • duckdb: add attach and detach methods for adding and removing databases to the current duckdb session (162b058)
  • duckdb: add ntile support (bf08a2a)
  • duckdb: add dict-like for DuckDB settings (ea2d317)
  • duckdb: add support for specific timestamp scales (3518b78)
  • duckdb: allow users to register fsspec filesystem with DuckDB (6172f07)
  • duckdb: expose option to force reinstall extension (98080d0)
  • duckdb: implement Table.sample as a TABLESAMPLE query (3a80f3a)
  • duckdb: implement partial json collection casting (aae28e9)
  • flink: add remaining operators for Flink to pass/skip the common tests (b27adc6)
  • flink: add several temporal operators (f758228)
  • flink: implement the ops.TryCast operation (752e587)
  • formats: map ibis JSON type to pyarrow strings (79b6eac)
  • impala/pyspark: implement to_pyarrow (6b33454)
  • impala: implement Table.sample (8e78dfc)
  • implement window table valued functions (a35a756)
  • improve generated column names for methods receiving intervals (c319ed3)
  • mssql: add support for timestamp bucket (1ffac11)
  • mssql: support cross-db/cross-schema table list (3e0f0fa)
  • mysql: support ntile (9a14ba3)
  • oracle: add fixes after running pre-commit (6538b70)
  • oracle: add fixes after running pre-commit (e3d14b3)
  • oracle: add support for loading Oracle RAW and BLOB types (c77eeb2)
  • oracle: change parsing of Oracle NUMBER data type (649ab86)
  • oracle: remove redundant brackets (2905484)
  • pandas: add read_csv and read_parquet (34eeca6)
  • pandas: support Table.sample (77215be)
  • polars: add support for timestamp bucket (c59518c)
  • postgres: add support for timestamp bucket (4d34afc)
  • pyspark: support Table.sample (6aa897e)
  • snowflake: support ntile (39eed1a)
  • snowflake: support cross-db/cross-schema table list (2071897)
  • snowflake: support timestamp bucketing (a95ffa9)
  • sql: implement Table.sample as a random() filter across several SQL backends (e1870ea)
  • trino: implement Table.sample as a TABLESAMPLE query (f3d044c)
  • trino: support ntile (2978d1a)
  • trino: support temporal operations (8b8e885)
  • udf: improve mypy compatibility for udf functions (65b5bb7)
  • use to_pyarrow instead of to_pandas in the interactive repr (72aa573)
  • ux: fix long links, add repr links in vscode (734bd91)
  • ux: implement recursive element conversion for nested types and json ([8ddfa94](https://gi...
Read more

7.0.0

02 Oct 17:04
Compare
Choose a tag to compare

7.0.0 (2023-10-02)

⚠ BREAKING CHANGES

  • api: the interpolation argument was only supported in the dask and pandas backends; for interpolated quantiles use dask or pandas directly
  • ir: Dask and Pandas only; cumulative operations that relied on implicit ordering from prior operations such as calls to table.order_by may no longer work, pass order_by=... into the appropriate cumulative method to achieve the same behavior.
  • api: UUID, MACADDR and INET are no longer subclasses of strings. Cast those values to string to enable use of the string APIs.
  • impala: ImpalaTable.rename is removed, use Backend.rename_table instead.
  • pyspark: PySparkTable.rename is removed, use Backend.rename_table instead.
  • clickhouse: ClickhouseTable is removed. This class only provided a single insert method. Use the Clickhouse backend's insert method instead.
  • datatypes: The minimum version of sqlglot is now 17.2.0, to support much faster and more robust backend type parsing.
  • ir: ibis.expr.selectors module is removed, use ibis.selectors instead
  • api: passing a tuple or a sequence of tuples to table.order_by() calls is not allowed anymore; use ibis.asc(key) or ibis.desc(key) instead
  • ir: the ibis.common.validators module has been removed
    and all validation rules from ibis.expr.rules, either use typehints
    or patterns from ibis.common.patterns

Features

  • api: add .delta method for computing difference in units between two temporal values (18617bf)
  • api: add ArrayIntersect operation and corresponding ArrayValue.intersect API (76c95b2)
  • api: add Backend.rename_table (0047143)
  • api: add levenshtein edit distance API (ab211a8)
  • api: add relocate table expression API for moving columns around based on selectors (ee8a86f)
  • api: add Table.rename, with support for renaming via keyword arguments (917d7ec)
  • api: add to_pandas_batches (740778f)
  • api: add support for referencing backend-builtin functions (76f5f4b)
  • api: implement negative slice indexing (caee5c1)
  • api: improve repr for deferred expressions containing Column/Scalar values (6b1218a)
  • api: improve repr of deferred functions (f2b3744)
  • api: support deferred and literal values in ibis.ifelse (685dbc1)
  • api: support deferred arguments in ibis.case() (6f9f7c5)
  • api: support deferred arguments to ibis.array (b1b83f9)
  • api: support deferred arguments to ibis.map (86c8669)
  • api: support deferred arguments to ibis.struct (7ef870d)
  • api: support deferred arguments to udfs (a49d259)
  • api: support deferred expressions in ibis.date (f454a71)
  • api: support deferred expressions in ibis.time (be1fd65)
  • api: support deferred expressions in ibis.timestamp (0e71505)
  • api: support deferred values in ibis.coalesce/ibis.greatest/ibis.least (e423480)
  • bigquery: implement array functions (04f5a11)
  • bigquery: use sqlglot to implement functional unnest to relational unnest (167c3bd)
  • clickhouse: add read_parquet and read_csv (dc2ea25)
  • clickhouse: add support for .sql methods (f1d004b)
  • clickhouse: implement builtin agg functions (eea679a)
  • clickhouse: support caching tables with the .cache() method (621bdac)
  • clickhouse: support reading parquet and csv globs (4ea1834)
  • common: match and replace graph nodes (78865c0)
  • datafusion: add coalesce, nullif, ifnull, zeroifnull (1cc67c9)
  • datafusion: add ExtractWeekOfYear, ExtractMicrosecond, ExtractEpochSeconds (5612d48)
  • datafusion: add join support (e2c143a)
  • datafusion: add temporal functions (6be6c2b)
  • datafusion: implement builtin agg functions (0367069)
  • duckdb: expose loading extensions (2feecf7)
  • examples: name examples tables according to example name (169d889)
  • flink: add batch and streaming mode test fixtures for Flink backend (49485f6)
  • flink: allow translation of decimal literals (52f7032)
  • flink: fine-tune numeric literal translation (2f2d0d9)
  • flink: implement ops.FloorDivide operation (95474e6)
  • flink: implement a minimal PyFlink Backend (46d0e33)
  • flink: implement insert dml (6bdec79)
  • flink: implement table-related ddl in Flink backend to support streaming connectors (8dabefd)
  • flink: implement translation of NULLIFZERO (6ad1e96)
  • flink: implement translation of ZEROIFNULL (31560eb)
  • flink: support translating typed null values (83beb7e)
  • impala: implement Backend.rename_table (309c999)
  • introduce watermarks in ibis api (eaaebb8)
  • just chat to open Zulip in terminal (95e164e)
  • patterns: support building sequences in replacement patterns (f320c2e)
  • patterns: support building sequences in replacement patterns (beab068)
  • patterns: support calling methods on builders like a variable (58b2d0e)
  • polars: implement new UDF API (becbf41)
  • polars: implement support for builtin aggregate udfs (c383f62)
  • polars: support reading ndjson ([1bda3bd](https://g...
Read more

6.2.0

31 Aug 16:02
Compare
Choose a tag to compare

6.2.0 (2023-08-31)

Features

  • trino: add source application to trino backend (cf5fdb9)

Bug Fixes

  • bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
  • bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
  • release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
  • trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
  • trino: override incorrect base sqlalchemy list_schemas implementation (84d38a1)

Documentation

  • trino: add connection docstring (507a00e)

6.1.0

03 Aug 20:23
Compare
Choose a tag to compare

6.1.0 (2023-08-03)

Features

  • api: add ibis.dtype top-level API (867e5f1)
  • api: add table.nunique() for counting unique table rows (adcd762)
  • api: allow mixing literals and columns in ibis.array (3355dd8)
  • api: improve efficiency of __dataframe__ protocol (15e27da)
  • api: support boolean literals in join API (c56376f)
  • arrays: add concat method equivalent to __add__/__radd__ (0ed0ab1)
  • arrays: add repeat method equivalent to __mul__/__rmul__ (b457c7b)
  • backends: add current_schema API (955a9d0)
  • bigquery: fill out CREATE TABLE DDL options including support for overwrite (5dac7ec)
  • datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
  • datafusion: add extract url fields functions (4f5ea98)
  • datafusion: add functions sign, power, nullifzero, log (ef72e40)
  • datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
  • datafusion: implement in-memory table (d4ec5c2)
  • flink: add tests and translation rules for additional operators (fc2aa5d)
  • flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
  • flink: implement translation rules for literal expressions in flink compiler (a8f4880)
  • improved error messages when missing backend dependencies (2fe851b)
  • make output of to_sql a proper str subclass (084bdb9)
  • pandas: add ExtractURLField functions (e369333)
  • polars: implement ops.SelfReference (983e393)
  • pyspark: read/write delta tables (d403187)
  • refactor ddl for create_database and add create_schema where relevant (d7a857c)
  • sqlite: add scalar python udf support to sqlite (92f29e6)
  • sqlite: implement extract url field functions (cb1956f)
  • trino: implement support for .sql table expression method (479bc60)
  • trino: support table properties when creating a table (b9d65ef)

Bug Fixes

  • api: allow scalar window order keys (3d3f4f3)
  • backends: make current_database implementation and API consistent across all backends (eeeeee0)
  • bigquery: respect the fully qualified table name at the init (a25f460)
  • clickhouse: check dispatching instead of membership in the registry for has_operation (acb7f3f)
  • datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
  • deps: update dependency datafusion to v27 (3a311cd)
  • druid: handle conversion issues from string, binary, and timestamp (b632063)
  • duckdb: avoid double escaping backslashes for bind parameters (8436f57)
  • duckdb: cast read_only to string for connection (27e17d6)
  • duckdb: deduplicate results from list_schemas() (172520e)
  • duckdb: ensure that current_database returns the correct value (2039b1e)
  • duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
  • duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
  • duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
  • duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
  • examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
  • exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
  • forward arguments through __dataframe__ protocol (50f3be9)
  • ir: change "it not a" to "is not a" in errors (d0d463f)
  • memtable: implement support for translation of empty memtable (05b02da)
  • mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
  • mysql: pass-through kwargs to connect_args (e3f3e2d)
  • ops: ensure that name attribute is always valid for ops.SelfReference (9068aca)
  • polars: ensure that pivot_longer works with more than one column (822c912)
  • polars: fix collect implementation (c1182be)
  • postgres: by default use domain socket (e44fdfb)
  • pyspark: make has_operation method a [@classmethod](https://github.com/classmethod) (c1b7dbc)
  • release: use @google/semantic-release-replace-plugin@1.2.0 to avoid module loading bug (673aab3)
  • snowflake: fix broken unnest functionality (207587c)
  • snowflake: reset the schema and database to the original schema after creating them (54ce26a)
  • snowflake: reset to original schema when resetting the database (32ff832)
  • snowflake: use regexp_instr != 0 instead of REGEXP keyword (06e2be4)
  • sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
  • sql: handle parsing aliases ([3645cf4](3645cf4119620e8b01...
Read more

6.0.0

05 Jul 15:05
Compare
Choose a tag to compare

6.0.0 (2023-07-05)

⚠ BREAKING CHANGES

  • imports: Use of ibis.udf as a module is removed. Use ibis.legacy.udf instead.

  • The minimum supported Python version is now Python 3.9

  • api: group_by().count() no longer automatically names the count aggregation count. Use relabel to rename columns.

  • backends: Backend.ast_schema is removed. Use expr.as_table().schema() instead.

  • snowflake/postgres: Postgres UDFs now use the new @udf.scalar.python API. This should be a low-effort replacement for the existing API.

  • ir: ops.NullLiteral is removed

  • datatypes: dt.Interval has no longer a default unit, dt.interval is removed

  • deps: snowflake-connector-python's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulernability. Please upgrade snowflake-connector-python to at least version 3.0.2.

  • api: Table.difference(), Table.intersection(), and Table.union() now require at least one argument.

  • postgres: Ibis no longer automatically defines first/last reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of the pgxn implementations instead.

  • api: ibis.examples.<example-name>.fetch no longer forwards arbitrary keyword arguments to read_csv/read_parquet.

  • datatypes: dt.Interval.value_type attribute is removed

  • api: Table.count() is no longer automatically named "count". Use Table.count().name("count") to achieve the previous behavior.

  • trino: The trino backend now requires at least version 0.321 of the trino Python package.

  • backends: removed AlchemyTable, AlchemyDatabase, DaskTable, DaskDatabase, PandasTable, PandasDatabase, PySparkDatabaseTable, use ops.DatabaseTable instead

  • dtypes: temporal unit enums are now available under ibis.common.temporal instead of ibis.common.enums.

  • clickhouse: external_tables can no longer be passed in ibis.clickhouse.connect. Pass external_tables directly in raw_sql/execute/to_pyarrow/to_pyarrow_batches().

  • datatypes: dt.Set is now an alias for dt.Array

  • bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.

  • impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use raw_sql if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on.

  • api: Column.first()/Column.last() are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function in select-based APIs should function unchanged.

  • bigquery: when using the bigquery backend, casting float to int
    will no longer round floats to the nearest integer

  • ops.Hash: The hash method on table columns on longer accepts
    the how argument. The hashing functions available are highly
    backend-dependent and the intention of the hash operation is to provide
    a fast, consistent (on the same backend, only) integer value.
    If you have been passing in a value for how, you can remove it and you
    will get the same results as before, as there were no backends with
    multiple hash functions working.

  • duckdb: Some CSV files may now have headers that did not have them previously. Set header=False to get the previous behavior.

  • deps: New environments will have a different default setting for compression in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Install clickhouse-cityhash and lz4 to preserve the previous behavior.

  • api: Table.set_column() is removed; use Table.mutate(name=expr) instead

  • api: the suffixes argument in all join methods has been removed in favor of lname/rname args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass in lname="{name}_x", rname="{name}_y".

  • ir: IntervalType.unit is now an enum instead of a string

  • type-system: Inferred types of Python objects may be slightly different. Ibis now use pyarrow to infer the column types of pandas DataFrame and other types.

  • backends: path argument of Backend.connect() is removed, use the database argument instead

  • api: removed Table.sort_by() and Table.groupby(), use .order_by() and .group_by() respectively

  • datatypes: DataType.scalar and column class attributes are now strings.

  • backends: Backend.load_data(), Backend.exists_database() and Backend.exists_table() are removed

  • ir: Value.summary() and NumericValue.summary() are removed

  • schema: Schema.merge() is removed, use the union operator schema1 | schema2 instead

  • api: ibis.sequence() is removed

  • drop support for Python 3.8 (747f4ca)

Features

  • add dask windowing (9cb920a)
  • add easy type hints to GroupBy (da330b1)
  • add microsecond method to TimestampValue and TimeValue (e9df2da)
  • api: add __dataframe__ implementation (b3d9619)
  • api: add ALL_CAPS option to Table.relabel (c0b30e2)
  • api: add first/last reduction APIs (8c01980)
  • api: add zip operation and api (fecf695)
  • api: allow passing multiple keyword arguments to ibis.interval (22ee854)
  • api: better repr and pickle support for deferred expressions (2b1ec9c)
  • api: exact median (c53031c)
  • api: raise better error on column name collision in joins (e04c38c)
  • api: replace suffixes in join with lname/rname (3caf3a1)
  • api: support abstract type names in selectors.of_type (f6d2d56)
  • api: support list of strings and single strings in the across selector (a6b60e7)
  • api: use create_table to load example data (42e09a4)
  • bigquery: add client and storage_client params to connect (4cf1354)
  • bigquery: enable group_concat over windows (d6a1117)
  • cast: add table-level try_cast (5e4d16b)
  • clickhouse: add array zip impl (efba835)
  • clickhouse: move to clickhouse supported Python client (012557a)
  • clickhouse: set default engine to native file (29815fa)
  • clickhouse: support pyarrow decimal types (7472dd5)
  • common: add a pure python egraph implementation (aed2ed0)
  • common: add pattern matchers (b515d5c)
  • common: add support for start parameter in StringFind (31ce741)
  • common: add Topmost and Innermost pattern matchers (90b48fc)
  • common: implement copy protocol for Immutable base class (e61c66b)
  • create_table: support pyarrow Table in table creation (9dbb25c)
  • datafusion: add string functions (66c0afb)
  • datafusion: add support for scalar pyarrow UDFs ([45935b7](45935b78922f09ab...
Read more

5.1.0

11 Apr 17:44
Compare
Choose a tag to compare

5.1.0 (2023-04-11)

Features

  • api: expand distinct API for dropping duplicates based on column subsets (3720ea5)
  • api: implement pyarrow memtables (9d4fbbd)
  • api: support passing a format string to Table.relabel (0583959)
  • api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
  • backends: add more array functions (5208801)
  • bigquery: make to_pyarrow_batches() smarter (42f5987)
  • bigquery: support bignumeric type (d7c0f49)
  • default repr to showing all columns in Jupyter notebooks (91a0811)
  • druid: add re_search support (946202b)
  • duckdb: add map operations (a4c4e77)
  • duckdb: support sqlalchemy 2 (679bb52)
  • mssql: implement ops.StandardDev, ops.Variance (e322f1d)
  • pandas: support memtable in pandas backend (6e4d621), closes #5467
  • polars: implement count distinct (aea4ccd)
  • postgres: implement ops.Arbitrary (ee8dbab)
  • pyspark: pivot_longer (f600c90)
  • pyspark: add ArrayFilter operation (2b1301e)
  • pyspark: add ArrayMap operation (e2c159c)
  • pyspark: add DateDiff operation (bfd6109)
  • pyspark: add partial support for interval types (067120d)
  • pyspark: add read_csv, read_parquet, and register (7bd22af)
  • pyspark: implement count distinct (db29e10)
  • pyspark: support basic caching (ab0df7a)
  • snowflake: add optional 'connect_args' param (8bf2043)
  • snowflake: native pyarrow support (ce3d6a4)
  • sqlalchemy: support unknown types (fde79fa)
  • sqlite: implement ops.Arbitrary (9bcdf77)
  • sql: use temp views where possible (5b9d8c0)
  • table: implement pivot_wider API (60e7731)
  • ux: move ibis.expr.selectors to ibis.selectors and deprecate for removal in 6.0 (0ae639d)

Bug Fixes

  • api: disambiguate attribute errors from a missing resolve method (e12c4df)
  • api: support filter on literal followed by aggregate (68d65c8)
  • clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
  • clickhouse: ensure that clickhouse depends on sqlalchemy for make_url usage (ea10a27)
  • clickhouse: ensure that truncate works (1639914)
  • clickhouse: fix create_table implementation (5a54489)
  • clickhouse: workaround sqlglot issue with calling match (762f4d6)
  • deps: support pandas 2.0 (4f1d9fe)
  • duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
  • duckdb: disable the progress bar by default (1a1892c)
  • duckdb: drop use of experimental parallel csv reader (47d8b92)
  • duckdb: generate SIMILAR TO instead of tilde to workaround sqlglot issue (434da27)
  • improve typing signature of .dropna() (e11de3f)
  • mssql: improve aggregation on expressions (58aa78d)
  • mssql: remove invalid aggregations (1ce3ef9)
  • polars: backwards compatibility for the time_zone and time_unit properties (3a2c4df)
  • postgres: allow inference of unknown types (343fb37)
  • pyspark: fail when aggregation contains a having filter (bd81a9f)
  • pyspark: raise proper error when trying to generate sql (51afc13)
  • snowflake: fix new array operations; remove ArrayRemove operation (772668b)
  • snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
  • snowflake: make sure pyarrow is used when possible (01f5154)
  • sql: ensure that set operations resolve to a single relation (3a02965)
  • sql: generate consistent pivot_longer semantics in the presence of multiple unnests (6bc301a)
  • sqlglot: work with newer versions (6f7302d)
  • trino,duckdb,postgres: make cumulative notany/notall aggregations work (c2e985f)
  • trino: only support how='first' with arbitrary reduction (315b5e7)
  • ux: use guaranteed length-1 characters for NULL values (8618789)

Refactors

  • api: remove explicit use of .projection in favor of the shorter .select (73df8df)
  • cache: factor out ref counted cache (c816f00)
  • duckdb: simplify to_pyarrow_batches implementation (d6235ee)
  • duckdb: source loaded and installed extensions from duckdb (fb06262)
  • duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
  • generate uuid-based names for temp tables ([a1164df](a1164df5d1bc4fa454371626a05...
Read more

5.0.0

15 Mar 22:36
Compare
Choose a tag to compare

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

  • api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
  • backend: Backends now raise ibis.common.exceptions.UnsupportedOperationError in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends.
  • ux: Table.info now returns an expression
  • ux: Passing a sequence of column names to Table.drop is removed. Replace drop(cols) with drop(*cols).
  • The spark plugin alias is removed. Use pyspark instead
  • ir: removed ibis.expr.scope and ibis.expr.timecontext modules, access them under ibis.backends.base.df.<module>
  • some methods have been removed from the top-level ibis.<backend> namespaces, access them on a connected backend instance instead.
  • common: removed ibis.common.geospatial, import the functions from ibis.backends.base.sql.registry.geospatial
  • datatypes: JSON is no longer a subtype of String
  • datatype: Category, CategoryValue/Column/Scalar are removed. Use string types instead.
  • ux: The metric_name argument to value_counts is removed. Use Table.relabel to change the metric column's name.
  • deps: the minimum version of parsy is now 2.0
  • ir/backends: removed the following symbols:
  • ibis.backends.duckdb.parse_type() function
  • ibis.backends.impala.Backend.set_database() method
  • ibis.backends.pyspark.Backend.set_database() method
  • ibis.backends.impala.ImpalaConnection.ping() method
  • ibis.expr.operations.DatabaseTable.change_name() method
  • ibis.expr.operations.ParseURL class
  • ibis.expr.operations.Value.to_projection() method
  • ibis.expr.types.Table.get_column() method
  • ibis.expr.types.Table.get_columns() method
  • ibis.expr.types.StringValue.parse_url() method
  • schema: Schema.from_dict(), .delete() and .append() methods are removed
  • datatype: struct_type.pairs is removed, use struct_type.fields instead
  • datatype: Struct(names, types) is not supported anymore, pass a dictionary to Struct constructor instead

Features

  • add max_columns option for table repr (a3aa236)
  • add examples API (b62356e)
  • api: add map/array accessors for easy conversion of JSON to stronger-typed values (d1e9d11)
  • api: add array to string join operation (74de349)
  • api: add builtin support for relabeling columns to snake case (1157273)
  • api: add support for passing a mapping to ibis.map (d365fd4)
  • api: allow single argument set operations (bb0a6f0)
  • api: implement to_pandas() API for ecosystem compatibility (cad316c)
  • api: implement isin (ac31db2)
  • api: make cache evaluate only once per session per expression (5a8ffe9)
  • api: make create_table uniform (833c698)
  • api: more selectors (5844304)
  • api: upcast pandas DataFrames to memtables in rlz.table rule (8dcfb8d)
  • backends: implement ops.Time for sqlalchemy backends (713cd33)
  • bigquery: add BIGNUMERIC type support (5c98ea4)
  • bigquery: add UUID literal support (ac47c62)
  • bigquery: enable subqueries in select statements (ef4dc86)
  • bigquery: implement create and drop table method (5f3c22c)
  • bigquery: implement create_view and drop_view method (a586473)
  • bigquery: support creating tables from in-memory tables (c3a25f1)
  • bigquery: support in-memory tables (37e3279)
  • change Rich repr of dtypes from blue to dim (008311f)
  • clickhouse: implement ArrayFilter translation (f2144b6)
  • clickhouse: implement ops.ArrayMap (45000e7)
  • clickhouse: implement ops.MapLength (fc82eaa)
  • clickhouse: implement ops.Capitalize (914c64c)
  • clickhouse: implement ops.ExtractMillisecond (ee74e3a)
  • clickhouse: implement ops.RandomScalar (104aeed)
  • clickhouse: implement ops.StringAscii (a507d17)
  • clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
  • clickhouse: improve error message for invalid types in literal (e4d7799)
  • clickhouse: support asof_join (7ed5143)
  • common: add abstract mapping collection with support for set operations (7d4aa0f)
  • common: add support for variadic positional and variadic keyword annotations (baea1fa)
  • common: hold typehint in the annotation objects (b3601c6)
  • common: support Callable arguments and return types in Validator.from_annotable() (ae57c36)
  • common: support positional only and keyword only arguments in annotations (340dca1)
  • dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
  • datafusion: implement ops.Degress, ops.Radians (7e61391)
  • datafusion: implement ops.Exp (7cb3ade)
  • datafusion: implement ops.Pi, ops.E (5a74cb4)
  • datafusion: implement ops.RandomScalar (5d1cd0f)
  • datafusion: implement ops.StartsWith (8099014)
  • datafusion: implement ops.StringAscii (b1d7672)
  • datafusion: implement ops.StrRight (016a082)
  • datafusion: implement ops.Translate (2fe3fc4)
  • datafusion: support substr without end (a19fd87)
  • datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
  • datatype: enable inference of Decimal type (8761732)
  • datatype: implement Mapping abstract base class for StructType (5df2022)
  • deps: add Python 3.11 support and tests ([6f3f759](https://github.com/ibis-project/ibis/commit...
Read more