Skip to content

Releases: featureform/featureform

v0.12.1

13 Feb 00:44
Compare
Choose a tag to compare

v0.12.0

12 Feb 23:27
Compare
Choose a tag to compare

What's Changed

Bugfixes

Full Changelog: v0.11.0...v0.12.0

v0.11.0

30 Nov 06:22
Compare
Choose a tag to compare

What's Changed

New Features

Quality of Life

Bugfixes

New Contributors

Full Changelog: v0.10.3...v0.11.0

v0.10.1

17 Aug 23:33
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.10.0...v0.10.1

v0.10.0

14 Jul 19:50
7f13eb3
Compare
Choose a tag to compare

What's Changed

V0.10 release brings:

- A brand-new Dashboard UI and enhanced functionality
- Vector database support in local and hosted mode for Weaviate and Pinecone
- API improvements for data science development
- Updated documentation and bugfixes

Dashboard Makeover and Upgrades

We're excited to bring you a more visually appealing Dashboard with new functionality for both users and administrators, including metadata management for resource tags, previewing transformation results, and clear visibility of transformation logic

64af1d82d153973fccf86317_4Ar9Po7GUJLOTIq5CZbJoNQFWgWcpCvVm3exXLtr7ogMJOlp2SCqY-61Cj_akC1U1KCfmaVwTwiRE763cvhW0bvWiaIlgoOLHmFuBNNIx3jsMpNyk08_ec1YXgt5MjNhfruB4XjfjUdrHBVd5IT5jfk

Assign tags to resources directly through the dashboard UI
64af1d83823a5ead51dfa88f_z9-GtRKcrE0MIjfeLS6Q6gM4wVS2BQVWoH_Yn33b0UHxiqJuLFsp9F9ytOF2mCfh5_ikrPM3yNGkUuvmnj13dTlHnQ-j_OYY88y-s8QuDlyCi7fBWc9eYIbUxXciNqjVH4PAn3RmJXzcfP5ot9TWUVw

Edit Resource Metadata from the Dashboard

64af1d83823a5ead51dfa88f_z9-GtRKcrE0MIjfeLS6Q6gM4wVS2BQVWoH_Yn33b0UHxiqJuLFsp9F9ytOF2mCfh5_ikrPM3yNGkUuvmnj13dTlHnQ-j_OYY88y-s8QuDlyCi7fBWc9eYIbUxXciNqjVH4PAn3RmJXzcfP5ot9TWUVw-2

Preview datasets directly from the dashboard

64af1d83307770eac98fc774_aecI-9V-HWManpCyGWroExo-uYd5x9SB4LlBsevmikq0iPQEN3VXugvd1pGkdwBR8zpsYN2zyCRZExsWlEA4Uwcpv_Jt1gqwgvrcWa1yLnOvizZhKp-DZ-ne8ALUSZ_Nwe7ZlqssMs6mZG4nnq3AWgs

Better formatting for Python and SQL transformations

64af1d83823a5ead51dfa88f_z9-GtRKcrE0MIjfeLS6Q6gM4wVS2BQVWoH_Yn33b0UHxiqJuLFsp9F9ytOF2mCfh5_ikrPM3yNGkUuvmnj13dTlHnQ-j_OYY88y-s8QuDlyCi7fBWc9eYIbUxXciNqjVH4PAn3RmJXzcfP5ot9TWUVw-1

Vector Database Support

You can now register Weaviate and Pinecone as providers!

Pinecone

Screen Shot 2023-07-14 at 11 56 58 AM

Weaviate

Screen Shot 2023-07-14 at 12 46 27 PM

API Improvements for Data Science Development

Read all files from a directory into a dataframe with ff.register_directory()

Screen Shot 2023-07-14 at 12 33 35 PM

Inference Stores are now optional in Local Mode: if an inference store is not specified, it will default to local mode.

Bug Fixes

  • Bugfix: Added error check for inputs list by @sdreyer in #878
  • Bugfix: Fixed how state is cleared by @ahmadnazeri in #876
  • Bugfix: Fixed lingering default variants by @ahmadnazeri in #879
  • Bugfix: Allowed nearest() to accept a Feature object by @ahmadnazeri in #885
  • Bugfix: Banner Color @RedLeader16 in #873
  • Bugfix: Give meaningful error when resource not found by @ahmadnazeri in #883
  • Bugfix: Fixed scheduling for KCF by @aolfat in #853
  • Bugfix: Fixed missing weaviate provider config implementation by @epps in #899
  • Bugfix: Removes Outdated logging package by @sdreyer in #913

Full Changelog: v0.9.0...v0.10.0

v0.9.0

06 Jun 04:08
0c7f68a
Compare
Choose a tag to compare

What's New

Vector Database and Embedding Support

You can use Featureform to define and orchestrate data pipelines that generate embeddings. Featureform can write them into either Redis for nearest neighbor lookup. This also allows users to version, re-use, and manage embeddings declaratively.

Registering Redis for use as a Vector Store (it’s the same as registering it typically)

ff.register_redis(
        name = "redis",
        description = "Example inference store",
        team = "Featureform",
        host = "0.0.0.0",
        port = 6379,
)

A Pipeline to Generate Embeddings from Text

docs = spark.register_file(...)

@spark.df_transform(
	inputs=[docs],
)
def embed_docs():
	docs[“embedding”] = docs[“text”].map(lambda txt: openai.Embedding.create(
                    model="text-embedding-ada-002",
                    input=txt,
                )["data"]
	return docs

Defining and Versioning an Embedding

@ff.entity
def Article:
	embedding = ff.Embedding(embed_docs[[“id”, “embedding”]], dims=1024, vector_db=redis)

@ff.entity
class Article:
    embedding = ff.Embedding(
        embed_docs[["id", "embedding"]],
        dims=1024,
        variant="test-variant",
        vector_db=redis,
    )

Performing a Nearest Neighbor Lookup

client.Nearest(Article.embedding, “id_123”, 25)

Interact with Training Sets as Dataframes

You can already interact with sources as dataframes, this release adds the same functionality to training sets as well.

Interacting with a training set as Pandas

import featureform as ff

client = ff.Client(...)
df = client.training_set(“fraud”, “simple”).dataframe()
print(df.head())

Enhanced Scheduling across Offline Stores

Featureform supports Cron syntax for scheduling transformations to run. This release rebuffs this functionality to make it more stable and efficient, and also adds more verbose error messages.

A transformation that runs every hour on Snowflake

@snowflake.sql_transform(schedule=“0 * * * *”)
def avg_transaction_price()
	return “SELECT user, AVG(price) FROM {{transaction}} GROUP BY user”

Run Pandas Transformations on K8s with S3

Featureform schedules and runs your transformations for you. We support running Pandas directly, Featureform spins up a Kubernetes job to run it. This isn’t a replacement for distributed processing frameworks like Spark (which we also support), but it’s a great option for teams that are already using Pandas for production.

Defining our Pandas on Kubernetes Provider

aws_creds = ff.AWSCredentials(
        aws_access_key_id="<aws_access_key_id>",
        aws_secret_access_key="<aws_secret_access_key>",
)

s3 = ff.register_s3(
name="s3",
        credentials=aws_creds,
        bucket_path="<s3_bucket_path>",
        bucket_region="<s3_bucket_region>"
)

pandas_k8s = ff.register_k8s(
        name="k8s",
        description="Native featureform kubernetes compute",
        store=s3,
        team="featureform-team"
)

Registering a file in S3 and a Transformation on it

src = pandas_k8s.register_file(...)

@pandas_k8s.df_transform(inputs=[src])
def transform(src):
	return src.groupby("CustomerID")["TransactionAmount"].mean()

v0.8.1

16 May 20:31
Compare
Choose a tag to compare

What's Changed

New Functionality

Enhancements

  • Updated Readme example to fix serving and use class api by @ahmadnazeri in #792
  • Dashboard Routing and Build Optimizations by @RedLeader16 in #781
  • Set Jobs Limit for scheduling by @aolfat in #794
  • Reformat and cleanup status displayer by @aolfat in #782
  • Bump pymdown-extensions from 9.9.2 to 10.0 by @dependabot in #804

Bug Fixes

  • Bad pathing exception #769 by @RedLeader16 in #773
  • Throw error if input tuple is not of type (str, str) by @ahmadnazeri in #780
  • Fix issue with paths for the Spark files by @ahmadnazeri in #776
  • Fix missing executor type in differing fields check for SparkProvider by @zhilingc in #789
  • Add default username and password for etcd coordinator by @aolfat in #798

New Contributors

Full Changelog: v0.8.0...v0.8.1

v0.8.0

10 May 16:21
d027124
Compare
Choose a tag to compare

What's Changed

  • Spark Enhancement: Yarn Support
  • Pull source and transformation data to client
client = Client()  # presumes $FEATUREFORM_HOST is set
client.apply(insecure=False)  # `insecure=True` for Docker (Quickstart only)

# Primary source as a dataframe
transactions_df = client.dataframe(
    transactions, limit=2
)  # Using the ColumnSourceRegistrar instance directly with a limit of 2 rows

# SQL transformation source as dataframe
avg_user_transaction_df = client.dataframe(
    "average_user_transaction", "quickstart"
)  # Using the source name and variant without a limit, which fetches all rows

print(transactions_df.head())

"""
  "transactionid" "customerid" "customerdob" "custlocation"  "custaccountbalance"  "transactionamount"           "timestamp"  "isfraud"
0              T1     C5841053       10/1/94     JAMSHEDPUR              17819.05                 25.0  2022-04-09T11:33:09Z      False
1              T2     C2142763        4/4/57        JHAJJAR               2270.69              27999.0  2022-03-27T01:04:21Z      False
"""
  • Added Ecommerce notebooks for Azure, AWS, GCP
  • Docs: Updated custom resource docs and added docs for KCF
  • Bugfix: Updated, more useful error messages
  • Bugfix: Fixed resource search
  • Bugfix: Fixed breadcrumb type and case error
  • Bugfix: KCF Resource limits
  • Bugfix: Fixed path for docker file and spark
  • Bugfix: Dashboard routing and reload fix
  • Bugfix: Spark databricks error message

Full Changelog: v0.7.3...v0.8.0

v0.7.3

18 Apr 23:18
Compare
Choose a tag to compare

What's Changed

  • Class API Enhancement: Optional timestamp_column when registering features/labels
  • Docs: Update AWS Deployment to Cover changes to eksctl
  • Python 3.11.2 Support
  • Bugfix: Resource Status in CLI List Command
  • Bugfix: Fixing spark issue with Spark chained transformation
  • Bugfix: Issue with not allowing Python objects as input to DF Transformation
  • Bugfix: Checks existence of training set features prior to creation
  • Adding notebook links to docs

New Contributors

Full Changelog: v0.7.2...v0.7.3

v0.7.2

06 Apr 03:01
9575428
Compare
Choose a tag to compare

What's Changed

  • Misc QOL improvements for the client

Full Changelog: v0.7.1...v0.7.2