13 Feb 00:44

sdreyer

9ac091f

v0.12.1 Latest

Latest

Full Changelog: v0.12.0...v0.12.1

Assets 2

12 Feb 23:27

sdreyer

v0.12.0

dbd3354

v0.12.0

What's Changed

Feature: Clickhouse offline store support (#1224) by @ahmadnazeri in #1232
Feature (Timestamp Variants) Turn on by default by @aolfat in #1176
Upgrade pandas to >=1.3.5 by @epps in #1175
Optional inference store for features by @aolfat in #1178
Adding support for users to deploy Featureform on docker via cli. by @ahmadnazeri in #962
Truncate long form errors by @anthonylasso in #1205
Register features in batch by @RiddhiBagadiaa in #1195
Searchable Tags by @ihkap11 in #1167
Inputs for SQL Transformations by @aolfat in #1233
Support for Resource Location by @ahmadnazeri in #1262
Remove local mode from main package by @ahmadnazeri in #1294

Bugfixes

Increase GRPC Stream Timeout by @sdreyer in #1190
Client-side gRC Configuration for Long-running jobs by @epps in #1192
Write Spark Submit Params to File Store to Avoid Databricks API 10K-byte Limit by @epps in #1197
Helm Install-Upgrade ETCD fix. by @anthonylasso in #1220
MD5 Hash of offline_store_spark_runner.py by @epps in #1213
Bug: Healthy Providers Aren't Rechecked When Reapplied by @epps in #1231
Fix banner reload issue by @anthonylasso in #1300
Redshift Configuration Correction by @epps in #1307
Source Modal Null Rows Fix by @anthonylasso in #1306
get_dynamodb method by @epps in #1309
Add missing HDFS switch case by @anthonylasso in #1311
Adds variant to materialization IDs for SQL providers by @epps in #1313
Check other definition for cast before calling .Query on it by @aolfat in #1321
Make ondemand feature be allowed to passed into as an object by @aolfat in #1312

Full Changelog: v0.11.0...v0.12.0

Contributors

aolfat, epps, and 5 other contributors

Assets 2

30 Nov 06:22

sdreyer

v0.11.0

44ff460

v0.11.0

What's Changed

New Features

Provider Health Checks by @epps in #1085
Ability To Preview Feature Data by @anthonylasso in #1129
Batch Serving (Snowflake) by @RiddhiBagadiaa in #1158
Batch Serving (Spark) by @RiddhiBagadiaa in #1174
Ondemand Feature Code Previews In Dashboard by @anthonylasso in #1169
Materialization via S3 Import to DynamoDB by @epps in #1161
Ability to Copy Name Variant to Clipboard by @anthonylasso in #1163
Ability to Get Spark Dataframes For Training Sets by @ahmadnazeri in #1121
Ability to See Lineage of Transformations in the Dashboard by @anthonylasso in #1096

Quality of Life

Improved caching for docker rebuilds by @sdreyer in #1043
Dynamically set dashboard table sizes by @anthonylasso in #1030
Databricks/client creds input validation by @ihkap11 in #1149

Bugfixes

Ensure each transformation function argument has an input by @aolfat in #1018
Fix state error for resource redefined error by @aolfat in #1032
Issue with source not retrieving any source by @ahmadnazeri in #1034
Refresh and Loading fix by @anthonylasso in #1039
Add resource type in title. by @anthonylasso in #1024
Fixed Conditional that added duplicate feature by @sdreyer in #1062
Docs Parsing error by @joshcolts18 in #1090
Materialization Copy Performance Improvement by @epps in #1079
Slow search results loading state by @anthonylasso in #1088
Flask compatibility with python 3.7 by @sdreyer in #1111
Materialize Copy Race Condition by @epps in #1113
Increasing Timeout For Long Running Jobs by @ahmadnazeri in #1118

New Contributors

@jerempy made their first contribution in #1074
@joshcolts18 made their first contribution in #1090
@syedzubeen made their first contribution in #1119
@ihkap11 made their first contribution in #1149

Full Changelog: v0.10.3...v0.11.0

Contributors

aolfat, epps, and 8 other contributors

Assets 2

17 Aug 23:33

sdreyer

v0.10.1

a28ff65

v0.10.1

What's Changed

Hotfix/pinecone casting by @sdreyer in #917
Bump grpcio from 1.51.1 to 1.53.0 by @dependabot in #908
Unit tests to cover the SQL format function by @anthonylasso in #918
Fix table row sizing by @anthonylasso in #937
Update CLI version output by @anthonylasso in #934
pytest Coverage by @epps in #940
Bugfix: KCF doesn't read csv files feature registered on source by @aolfat in #943
Update README.md by @sdreyer in #946
Run coverage on main by @sdreyer in #950
Setup code to enable unit-level metadata_server + provider tests. by @anthonylasso in #944
Provider Config Testing #64 by @anthonylasso in #948
KCF dockerfile image to 3.10 + use consistent dill versioning by @aolfat in #955
"Get" Provider Testing by @anthonylasso in #954
Dashboard metadata variant bugfix by @anthonylasso in #933
Change Resource names to ResourceVariant by @aolfat in #941
Register instantiate tests by @anthonylasso in #961
Bugfix: Dataframe transformation by @anthonylasso in #926
Spark Tests by @sdreyer in #958
Tests/spark by @sdreyer in #965
FEATURE: PostgreSQL: adding support for ssl mode by @ahmadnazeri in #964
Fix ETCD issue #945 by @anthonylasso in #967
Cloud Storage Pathing (Azure Blob Storage) by @epps in #947

Full Changelog: v0.10.0...v0.10.1

Contributors

aolfat, epps, and 4 other contributors

Assets 2

14 Jul 19:50

sdreyer

v0.10.0

7f13eb3

v0.10.0

What's Changed

V0.10 release brings:

- A brand-new Dashboard UI and enhanced functionality
- Vector database support in local and hosted mode for Weaviate and Pinecone
- API improvements for data science development
- Updated documentation and bugfixes

Dashboard Makeover and Upgrades

We're excited to bring you a more visually appealing Dashboard with new functionality for both users and administrators, including metadata management for resource tags, previewing transformation results, and clear visibility of transformation logic

Assign tags to resources directly through the dashboard UI

Edit Resource Metadata from the Dashboard

Preview datasets directly from the dashboard

Better formatting for Python and SQL transformations

Vector Database Support

You can now register Weaviate and Pinecone as providers!

Pinecone

Weaviate

API Improvements for Data Science Development

Read all files from a directory into a dataframe with ff.register_directory()

Inference Stores are now optional in Local Mode: if an inference store is not specified, it will default to local mode.

Bug Fixes

Bugfix: Added error check for inputs list by @sdreyer in #878
Bugfix: Fixed how state is cleared by @ahmadnazeri in #876
Bugfix: Fixed lingering default variants by @ahmadnazeri in #879
Bugfix: Allowed nearest() to accept a Feature object by @ahmadnazeri in #885
Bugfix: Banner Color @RedLeader16 in #873
Bugfix: Give meaningful error when resource not found by @ahmadnazeri in #883
Bugfix: Fixed scheduling for KCF by @aolfat in #853
Bugfix: Fixed missing weaviate provider config implementation by @epps in #899
Bugfix: Removes Outdated logging package by @sdreyer in #913

Full Changelog: v0.9.0...v0.10.0

Contributors

aolfat, epps, and 3 other contributors

Assets 2

06 Jun 04:08

sdreyer

v0.9.0

0c7f68a

v0.9.0

What's New

Vector Database and Embedding Support

You can use Featureform to define and orchestrate data pipelines that generate embeddings. Featureform can write them into either Redis for nearest neighbor lookup. This also allows users to version, re-use, and manage embeddings declaratively.

Registering Redis for use as a Vector Store (it’s the same as registering it typically)

ff.register_redis(
        name = "redis",
        description = "Example inference store",
        team = "Featureform",
        host = "0.0.0.0",
        port = 6379,
)

A Pipeline to Generate Embeddings from Text

docs = spark.register_file(...)

@spark.df_transform(
	inputs=[docs],
)
def embed_docs():
	docs[“embedding”] = docs[“text”].map(lambda txt: openai.Embedding.create(
                    model="text-embedding-ada-002",
                    input=txt,
                )["data"]
	return docs

Defining and Versioning an Embedding

@ff.entity
def Article:
	embedding = ff.Embedding(embed_docs[[“id”, “embedding”]], dims=1024, vector_db=redis)

@ff.entity
class Article:
    embedding = ff.Embedding(
        embed_docs[["id", "embedding"]],
        dims=1024,
        variant="test-variant",
        vector_db=redis,
    )

Performing a Nearest Neighbor Lookup

client.Nearest(Article.embedding, “id_123”, 25)

Interact with Training Sets as Dataframes

You can already interact with sources as dataframes, this release adds the same functionality to training sets as well.

Interacting with a training set as Pandas

import featureform as ff

client = ff.Client(...)
df = client.training_set(“fraud”, “simple”).dataframe()
print(df.head())

Enhanced Scheduling across Offline Stores

Featureform supports Cron syntax for scheduling transformations to run. This release rebuffs this functionality to make it more stable and efficient, and also adds more verbose error messages.

A transformation that runs every hour on Snowflake

@snowflake.sql_transform(schedule=“0 * * * *”)
def avg_transaction_price()
	return “SELECT user, AVG(price) FROM {{transaction}} GROUP BY user”

Run Pandas Transformations on K8s with S3

Featureform schedules and runs your transformations for you. We support running Pandas directly, Featureform spins up a Kubernetes job to run it. This isn’t a replacement for distributed processing frameworks like Spark (which we also support), but it’s a great option for teams that are already using Pandas for production.

Defining our Pandas on Kubernetes Provider

aws_creds = ff.AWSCredentials(
        aws_access_key_id="<aws_access_key_id>",
        aws_secret_access_key="<aws_secret_access_key>",
)

s3 = ff.register_s3(
name="s3",
        credentials=aws_creds,
        bucket_path="<s3_bucket_path>",
        bucket_region="<s3_bucket_region>"
)

pandas_k8s = ff.register_k8s(
        name="k8s",
        description="Native featureform kubernetes compute",
        store=s3,
        team="featureform-team"
)

Registering a file in S3 and a Transformation on it

src = pandas_k8s.register_file(...)

@pandas_k8s.df_transform(inputs=[src])
def transform(src):
	return src.groupby("CustomerID")["TransactionAmount"].mean()

Assets 2

16 May 20:31

sdreyer

v0.8.1

45746cc

v0.8.1

What's Changed

New Functionality

KCF/S3 Support by @ahmadnazeri in #786

Enhancements

Updated Readme example to fix serving and use class api by @ahmadnazeri in #792
Dashboard Routing and Build Optimizations by @RedLeader16 in #781
Set Jobs Limit for scheduling by @aolfat in #794
Reformat and cleanup status displayer by @aolfat in #782
Bump pymdown-extensions from 9.9.2 to 10.0 by @dependabot in #804

Bug Fixes

Bad pathing exception #769 by @RedLeader16 in #773
Throw error if input tuple is not of type (str, str) by @ahmadnazeri in #780
Fix issue with paths for the Spark files by @ahmadnazeri in #776
Fix missing executor type in differing fields check for SparkProvider by @zhilingc in #789
Add default username and password for etcd coordinator by @aolfat in #798

New Contributors

@zhilingc made their first contribution in #789

Full Changelog: v0.8.0...v0.8.1

Contributors

aolfat, zhilingc, and 3 other contributors

Assets 2

10 May 16:21

ahmadnazeri

v0.8.0

d027124

v0.8.0

What's Changed

Spark Enhancement: Yarn Support
Pull source and transformation data to client

client = Client()  # presumes $FEATUREFORM_HOST is set
client.apply(insecure=False)  # `insecure=True` for Docker (Quickstart only)

# Primary source as a dataframe
transactions_df = client.dataframe(
    transactions, limit=2
)  # Using the ColumnSourceRegistrar instance directly with a limit of 2 rows

# SQL transformation source as dataframe
avg_user_transaction_df = client.dataframe(
    "average_user_transaction", "quickstart"
)  # Using the source name and variant without a limit, which fetches all rows

print(transactions_df.head())

"""
  "transactionid" "customerid" "customerdob" "custlocation"  "custaccountbalance"  "transactionamount"           "timestamp"  "isfraud"
0              T1     C5841053       10/1/94     JAMSHEDPUR              17819.05                 25.0  2022-04-09T11:33:09Z      False
1              T2     C2142763        4/4/57        JHAJJAR               2270.69              27999.0  2022-03-27T01:04:21Z      False
"""

Added Ecommerce notebooks for Azure, AWS, GCP
Docs: Updated custom resource docs and added docs for KCF
Bugfix: Updated, more useful error messages
Bugfix: Fixed resource search
Bugfix: Fixed breadcrumb type and case error
Bugfix: KCF Resource limits
Bugfix: Fixed path for docker file and spark
Bugfix: Dashboard routing and reload fix
Bugfix: Spark databricks error message

Full Changelog: v0.7.3...v0.8.0

Assets 2

18 Apr 23:18

sdreyer

v0.7.3

b3e563e

v0.7.3

What's Changed

Class API Enhancement: Optional timestamp_column when registering features/labels
Docs: Update AWS Deployment to Cover changes to eksctl
Python 3.11.2 Support
Bugfix: Resource Status in CLI List Command
Bugfix: Fixing spark issue with Spark chained transformation
Bugfix: Issue with not allowing Python objects as input to DF Transformation
Bugfix: Checks existence of training set features prior to creation
Adding notebook links to docs

New Contributors

@jmeisele made their first contribution in #727

Full Changelog: v0.7.2...v0.7.3

Contributors

jmeisele

Assets 2

06 Apr 03:01

sdreyer

v0.7.2

9575428

v0.7.2

What's Changed

Misc QOL improvements for the client

Full Changelog: v0.7.1...v0.7.2

Assets 2

Releases: featureform/featureform

v0.12.1

v0.12.0

What's Changed

Bugfixes

Contributors

v0.11.0

What's Changed

New Features

Quality of Life

Bugfixes

New Contributors

Contributors

v0.10.1

What's Changed

Contributors

v0.10.0

What's Changed

V0.10 release brings:

Dashboard Makeover and Upgrades

Vector Database Support

API Improvements for Data Science Development

Bug Fixes

Contributors

v0.9.0

What's New

Vector Database and Embedding Support

Registering Redis for use as a Vector Store (it’s the same as registering it typically)

A Pipeline to Generate Embeddings from Text

Defining and Versioning an Embedding

Performing a Nearest Neighbor Lookup

Interact with Training Sets as Dataframes

Interacting with a training set as Pandas

Enhanced Scheduling across Offline Stores

A transformation that runs every hour on Snowflake

Run Pandas Transformations on K8s with S3

Defining our Pandas on Kubernetes Provider

Registering a file in S3 and a Transformation on it

v0.8.1

What's Changed

New Functionality

Enhancements

Bug Fixes

New Contributors

Contributors

v0.8.0

What's Changed

v0.7.3

What's Changed

New Contributors

Contributors

v0.7.2

What's Changed