Skip to content

Releases: machow/siuba

Experimental Symbolic autocompletion

07 Aug 22:17
ab4d114
Compare
Choose a tag to compare

Thanks to @tmastny for the PR (#248)!

import siuba.experimental.completer

download

Fix lhs ops, support kwargs in sql count

20 May 21:26
338c1e0
Compare
Choose a tag to compare
  • Fix lhs ops (#235)
  • Support kwargs in SQL count (#234)

Small fix for summarize, w/ Series results

12 May 22:41
Compare
Choose a tag to compare

See issue #138. This release ensures summarize...

  • validates results are scalar or length 1.
  • uses a Series results underlying array, to issues around Series indexes in DataFrame construction.

Small update for docs: Call.map_replace and cars data

06 May 17:31
Compare
Choose a tag to compare

This is a small release, designed to support the new siuba documentation.

Features

  • added Call.map_replace method, which is like map_subcall but replaces subcalls with the result
  • added to siuba.data: cars, cars_sql

top_n, floor_date, custom sql joins, and full method spec

25 Apr 05:46
Compare
Choose a tag to compare

Fixes

  • filter now preserves column order, rather than moving grouping columns to left (#205)
  • symbolic representations now correctly align on keywords (#222)

Features

  • sql supports custom join conditions via sql_on (#202)
  • siuba.series.spec now includes all Series methods, even unsupported ones (#209)
  • the spec also now is derived from the file siuba/series/spec.yml (#211)
  • siu Symbolic is no longer falsey (#210)
  • added new verb top_n (#222)
  • added vector functions ceil_date and floor_date to siuba.experimental.datetime (#222)

QA

  • re-enabled testing of example jupyter notebooks (#206)

Add fct_lump prop argument, fix fast grouped summarize

17 Feb 23:53
cf72cf2
Compare
Choose a tag to compare

Fixes

  • added more fast grouped method tests, and fixed fast summarize (#197)

Features

  • support prop argument in fct_lump (#195)

fix if_else, remove psycopg2 dependency

11 Feb 02:28
Compare
Choose a tag to compare

Fixes

  • if_else doesn't try to coerce to new type at end (#179)
  • removed psycopg2 dependency (causes install to fail if user does not have postgres) #189

Fix nest function to support pandas v1.0.0

08 Feb 04:28
d5c9e4b
Compare
Choose a tag to compare

Fixes nest raising the error "TypeError: copy() takes no keyword arguments". Nest now uses a more principled approach to splitting a grouped DataFrame, and creating a list of sub frames! (see #182)

Also fixed doc build, by not trying to run notebooks starting with draft-. (#186)

Support for user defined functions (UDFs)

08 Feb 04:22
Compare
Choose a tag to compare

New Feature: support user defined functions (#146)

  • Support for user defined functions (UDFs). Note that these require annotating the return type. For more on the theory behind these see ADR-003.
from siuba.siu import symbolic_dispatch
from pandas.core.groupby import SeriesGroupBy, GroupBy
from pandas import Series

@symbolic_dispatch(cls = Series)
def cummean(x):
    """Return a same-length array, containing the cumulative mean."""
    return x.expanding().mean()


@cummean.register(SeriesGroupBy)
def _cummean_grouped(x) -> SeriesGroupBy:
    grouper = x.grouper
    n_entries = x.obj.notna().groupby(grouper).cumsum()

    res = x.cumsum() / n_entries

    return res.groupby(grouper)

from siuba import _, mutate
from siuba.data import mtcars

# a pandas DataFrameGroupBy object
g_cyl = mtcars.groupby("cyl")

mutate(g_students, cumul_mean = cummean(_.score))
  • Support for many methods in vector.py, using UDFs (#158)

Bug Fixes

  • Fix regression where .str wasn't being removed when processing siu expressions for SQL (#159)
  • Grouped filter now preserves order
  • Verbs now tested to preserve original index (d938ab3)

Tests

  • Add many more versions of python and pandas to travis CI test matrix (#161)

Opt-in speedy support for grouped pandas

29 Oct 00:30
ca35930
Compare
Choose a tag to compare

Features

  • Implementation of fast mutate, filter, and summarize using CallTreeLocal (#134). For even just a couple thousand groups, the fast methods are close to optimal hand-written pandas, and the slow versions are almost 1000x slower :o.
  • fixed current grouped pandas mutate to preserve row order (#139)
  • laid down tests of all supported series methods, currently skipping SQL backends (but ready to go!)
  • put up some very basic documentation (#145)
  • wrote an ADR on the rational for fast groupby (#135)

Note that CallTreeLocal has new options, allowing it to look up based on chained attributes (e.g. look for an entry named "dt.year", and override custom function calls.).

I still need to finish support for user defined operations and some light siu refactoring.

Breaking changes

  • Removed the rm_attr argument from CallTreeLocal, since converting subattrs like dt.year will consume dt anyway (can't imagine a situation where we'd want to keep it, and couldn't do that in the translator function)

Demo

from siuba.experimental.pd_groups import fast_mutate, fast_filter, fast_summarize
from siuba import *
from siuba.data import mtcars

g_cars = mtcars.groupby(['cyl', 'gear'])

fast_mutate(g_cars, _.hp - _.hp.mean())