Skip to content

Releases: featureform/featureform

v0.7.1

03 Apr 17:31
Compare
Choose a tag to compare

What's Changed

v0.7.0

03 Apr 05:44
ecdfcb5
Compare
Choose a tag to compare

Release 0.7

Define Feature and Labels with an ORM-style Syntax

Featureform has added a new way to define entities, features, and labels. This new API, which takes inspiration from Python ORMs, makes it easier for data scientists to define and manage their features and labels in code.

Example

transactions = postgres.register_table(
  name="transactions",
  table="Transactions", # This is the table's name in Postgres
)

@postgres.sql_transformation()
def average_user_transaction():
  return "SELECT CustomerID as user_id, avg(TransactionAmount) " \
  "as avg_transaction_amt from {{transactions.default}} GROUP BY user_id"

@ff.entity
class User:
  avg_transactions = ff.Feature(
    average_user_transaction[["user_id", "avg_transaction_amt"]],
    type=ff.Float32,
    inference_store=redis,
  )
  fraudulent = ff.Label(
    transactions[["customerid", "isfraud"]], variant="quickstart", type=ff.Bool
  )

ff.register_training_set(
  "fraud_training",
  label="fraudulent",
  features=["avg_transactions"],
)

You can read more in the docs.

Compute features at serving time with on-demand features

A highly requested feature was to feature-ize incoming data at serving time. For example, you may have an on-demand feature that turns a user comment into an embedding, or one that processes an incoming image.

On-demand feature that turns a comment to an embedding at serving time

@ff.ondemand_feature
def text_to_embedding(serving_client, params, entities):
    return bert_transform(params[“comment”])

You can learn more in the docs

Attach tags & user-defined values to Featureform resources like transformations, features, and labels.

All features, labels, transformations, and training sets now have a tags and properties argument. properties is a dict and tags is a list.

client.register_training_set(“CustomerLTV_Training”, “default”, label=ltv”, features=[“f1”, “f2”], tags=[“revenue”], properties={“visibility”: “internal”})

You can read more in the docs.

Transformation and training set caching in local mode.

Featureform has a local mode that allows users to define, manage, and serve their features when working locally off their laptop. It doesn’t require anything to be deployed. It would historically re-generate training sets and features on each run, but with 0.7, we cache results by default to decrease iteration time.

A cleaner (and more colorful) CLI flow!

New CLI running featureform apply with colors

Full Changelog: v0.6.4...v0.7.0

v0.6.4

21 Mar 01:46
Compare
Choose a tag to compare

What's Changed

  • Bugfix for headers not being fetched in Spark Dataframe transformations
    Full Changelog: v0.6.3...v0.6.4

v0.6.3

20 Mar 23:24
Compare
Choose a tag to compare

What's Changed

  • Added Search to the standalone docker container
  • GCP Filestore bug fixes
    Full Changelog: v0.6.2...v0.6.3

v0.6.2

15 Mar 22:44
Compare
Choose a tag to compare

What's Changed

  • Bugfix for typeguard python package version

Full Changelog: v0.6.1...v0.6.2

v0.6.1

15 Mar 22:42
Compare
Choose a tag to compare

What's Changed

  • Search Bugfix for Standalone Container

Full Changelog: v0.6.0...v0.6.1

v0.6.0

06 Mar 10:22
c3ede69
Compare
Choose a tag to compare

Release 0.6

Generic Spark support as a Provider

Featureform has had support for Spark on EMR and Spark on Databricks for a while. We’ve generalized our Spark implementation to handle all versions of Spark using any of S3, GCS, Azure Blob Store, or HDFS as a backing store!

Here are some examples:

Spark with GCS backend

spark_creds = ff.SparkCredentials(
	master=master_ip_or_local,
	deploy_mode="client",
	python_version=cluster_py_version,
)

gcp_creds = ff.GCPCredentials(
	project_id=project_id,
	credentials_path=path_to_gcp_creds,
)

gcs = ff.register_gcs(
	name=gcs_provider_name,
	credentials=gcp_creds,
	bucket_name=bucket_name”,
	bucket_path="directory/",
)

spark = ff.register_spark(
	name=spark_provider_name,
	description="A Spark deployment we created for the Featureform quickstart",
	team="featureform-team",
	executor=spark_creds,
	filestore=gcs,
)

Databricks with Azure

databricks = ff.DatabricksCredentials(
	host=host,
	token=token,
	cluster_id=cluster,
)

azure_blob = ff.register_blob_store(
	name=blob”,
	account_name=os.getenv("AZURE_ACCOUNT_NAME", None),
	account_key=os.getenv("AZURE_ACCOUNT_KEY", None),
	container_name=os.getenv("AZURE_CONTAINER_NAME", None),
	root_path="testing/ff",
)

spark = ff.register_spark(
	name=spark-databricks-azure”,
	description="A Spark deployment we created for the Featureform quickstart",
	team="featureform-team",
	executor=databricks,
	filestore=azure_blob,
)

EMR with S3

spark_creds = ff.SparkCredentials(
	master=master_ip_or_local,
	deploy_mode="client",
	python_version=cluster_py_version,
)

aws_creds = ff.AWSCredentials(
	aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", None),
	aws_secret_access_key=os.getenv("AWS_SECRET_KEY", None),
)

s3 = ff.register_s3(
	name="s3-quickstart",
	credentials=aws_creds,
	bucket_path=os.getenv("S3_BUCKET_PATH", None),
	bucket_region=os.getenv("S3_BUCKET_REGION", None),
)

spark = ff.register_spark(
	name="spark-generic-s3",
	description="A Spark deployment we created for the Featureform quickstart",
	team="featureform-team",
	executor=spark_creds,
	filestore=s3,
)

Spark with HDFS

spark_creds = ff.SparkCredentials(
	master=os.getenv("SPARK_MASTER", "local"),
	deploy_mode="client",
	python_version="3.7.16",
)

hdfs = ff.register_hdfs(
	name="hdfs_provider",
	host=host,
	port="9000",
	username="hduser"
)

spark = ff.register_spark(
	name="spark-hdfs",
	description="A Spark deployment we created for the Featureform quickstart",
	team="featureform-team",
	executor=spark_creds,
	filestore=hdfs,
)

You can read more in the docs.

Track which models are using features / training sets at serving time

A highly requested feature was to add a lineage link between models and their feature & training set. Now when you serve a feature and training set you can include an optional model argument.

client.features("review_text", entities={"order": "df8e5e994bcc820fcf403f9a875201e6"}, model="sentiment_analysis")
client.training_set(“CustomerLTV_Training”, “default”, model=linear_ltv_model”)

It can then be viewed via the CLI & the Dashboard:

Dashboard

CLI

You can learn more in the docs

Backup & Recovery now available in open-source Featureform

Backup and recovery was originally exclusive to our enterprise offering. It is our goal to open-source everything in the product that isn’t related to governance, though we often first pilot new features with clients as we nail down the API.

Enable Backups

  1. Create a k8s secret with information on where to store backups.
> python backup/create_secret.py --help
Usage: create_secret.py [OPTIONS] COMMAND [ARGS]...
Generates a Kubernetes secret to store Featureform backup data.
Use this script to generate the Kubernetes secret, then apply it with:
`kubectl apply -f backup_secret.yaml`

Options:
-h, --help Show this message and exit.

Commands:
azure Create secret for azure storage containers
gcs Create secret for GCS buckets
s3 Create secret for S3 buckets
  1. Upgrade your Helm cluster (if it was created without backups enabled)
helm upgrade featureform featureform/featureform [FLAGS] --set backup.enable=true --set backup.schedule=<schedule>

Where schedule is in cron syntax, for example an hourly backup would look like:
"0 * * * *"

Recover from backup

Recovering from a backup is simple. In backup/restore, edit the .env-template file with your cloud provider name and credentials, then rename to .env. A specific snapshot can be used by filling in the SNAPSHOT_NAME variable in the .env file.

After that, run recover.sh in that directory.

You can learn more in the docs.

Ability to rotate key and change provider credentials

Prior to this release, if you were to rotate a key and/or change a credential you’d have to create a new provider. We made things immutable to avoid people accidentally overwriting each other's providers; however, this blocked the ability to rotate keys. Now, provider changes work as an upsert.

For example if you had registered Databricks and applied it like this:

databricks = ff.DatabricksCredentials(
	host=host,
	token=old_token,
	cluster_id=cluster,
)

You could change it by simply changing the config and re-applying it.

databricks = ff.DatabricksCredentials(
	host=host,
	token=new_token,
	cluster_id=cluster,
)

Ability to do full-text search on resources from the CLI

Prior to this release, you could only search resources from the dashboard. We’ve added the same functionality into the CLI. Our goal is to stay as close to feature parity between the dashboard and CLI as possible.

CLI

Dashboard

Enhancements

Mutable Providers

Featureform has historically made all resources immutable to solve a variety of different problems. This includes upstreams changing and breaking downstreams. Over the next couple releases we expect to dramatically pull back on forcing immutability while still avoiding the most common types of problems.

Featureform apply now works as an Upsert. For providers specifically, you can change most of their fields. This also makes it possible to rotate secrets and change credentials as outlined earlier in these release notes.

Support for Legacy Snowflake Credentials

Older deployments of Snowflake used an Account Locator rather than an Organization/Account pair to connect, you can now use our register_snowflake_legacy method.

ff.register_snowflake_legacy(
name = "snowflake_docs",
description = "Example training store",
team = "Featureform",
username = snowflake_username,
password: snowflake_password,
account_locator: snowflake_account_locator,
database: snowflake_database,
schema: snowflake_schema,
)

You can learn more in the docs.

Experimental

Custom Transformation-specific Container Limits for Pandas on K8s transformations

Pandas on K8s is still an experimental feature that we’re continuing to expand on. You were previously able to specify container limits for all, but now for specifically heavy or light transformations you can get more granular about your specifications as follows:

resource_specs = K8sResourceSpecs(
	cpu_request="250m",
	cpu_limit="50Mi",
	memory_request="500m",
	memory_limit="100Mi"
)

@k8s.df_transformation(
	inputs=[("transactions", “v2”)],
	resource_specs=resource_specs
)
def transform(transactions):
	pass

You can learn more in the docs.

v0.5.1

08 Feb 02:48
Compare
Choose a tag to compare

What's Changed

  • Additional Snowflake Parameters (Role & Warehouse)

Full Changelog: v0.5.0...v0.5.1

v0.5.0

07 Feb 02:06
da1c21a
Compare
Choose a tag to compare

What's Changed

  • Status Functions For Resources
  • Custom KCF Images
  • Azure Quickstart
  • Support For Legacy Snowflake Credentials
  • ETCD Backup and Recovery

Full Changelog: v0.4.0...v0.5.0

v0.4.6

01 Feb 00:27
Compare
Choose a tag to compare

What's Changed

  • Fix for Provider Image in Local Mode

Full Changelog: v0.4.5...v0.4.6