Releases: featureform/featureform
v0.12.1
v0.12.0
What's Changed
- Feature: Clickhouse offline store support (#1224) by @ahmadnazeri in #1232
- Feature (Timestamp Variants) Turn on by default by @aolfat in #1176
- Upgrade pandas to >=1.3.5 by @epps in #1175
- Optional inference store for features by @aolfat in #1178
- Adding support for users to deploy Featureform on docker via cli. by @ahmadnazeri in #962
- Truncate long form errors by @anthonylasso in #1205
- Register features in batch by @RiddhiBagadiaa in #1195
- Searchable Tags by @ihkap11 in #1167
- Inputs for SQL Transformations by @aolfat in #1233
- Support for Resource Location by @ahmadnazeri in #1262
- Remove local mode from main package by @ahmadnazeri in #1294
Bugfixes
- Increase GRPC Stream Timeout by @sdreyer in #1190
- Client-side gRC Configuration for Long-running jobs by @epps in #1192
- Write Spark Submit Params to File Store to Avoid Databricks API 10K-byte Limit by @epps in #1197
- Helm Install-Upgrade ETCD fix. by @anthonylasso in #1220
- MD5 Hash of
offline_store_spark_runner.py
by @epps in #1213 - Bug: Healthy Providers Aren't Rechecked When Reapplied by @epps in #1231
- Fix banner reload issue by @anthonylasso in #1300
- Redshift Configuration Correction by @epps in #1307
- Source Modal Null Rows Fix by @anthonylasso in #1306
get_dynamodb
method by @epps in #1309- Add missing HDFS switch case by @anthonylasso in #1311
- Adds variant to materialization IDs for SQL providers by @epps in #1313
- Check other definition for cast before calling .Query on it by @aolfat in #1321
- Make ondemand feature be allowed to passed into as an object by @aolfat in #1312
Full Changelog: v0.11.0...v0.12.0
v0.11.0
What's Changed
New Features
- Provider Health Checks by @epps in #1085
- Ability To Preview Feature Data by @anthonylasso in #1129
- Batch Serving (Snowflake) by @RiddhiBagadiaa in #1158
- Batch Serving (Spark) by @RiddhiBagadiaa in #1174
- Ondemand Feature Code Previews In Dashboard by @anthonylasso in #1169
- Materialization via S3 Import to DynamoDB by @epps in #1161
- Ability to Copy Name Variant to Clipboard by @anthonylasso in #1163
- Ability to Get Spark Dataframes For Training Sets by @ahmadnazeri in #1121
- Ability to See Lineage of Transformations in the Dashboard by @anthonylasso in #1096
Quality of Life
- Improved caching for docker rebuilds by @sdreyer in #1043
- Dynamically set dashboard table sizes by @anthonylasso in #1030
- Databricks/client creds input validation by @ihkap11 in #1149
Bugfixes
- Ensure each transformation function argument has an input by @aolfat in #1018
- Fix state error for resource redefined error by @aolfat in #1032
- Issue with source not retrieving any source by @ahmadnazeri in #1034
- Refresh and Loading fix by @anthonylasso in #1039
- Add resource type in title. by @anthonylasso in #1024
- Fixed Conditional that added duplicate feature by @sdreyer in #1062
- Docs Parsing error by @joshcolts18 in #1090
- Materialization Copy Performance Improvement by @epps in #1079
- Slow search results loading state by @anthonylasso in #1088
- Flask compatibility with python 3.7 by @sdreyer in #1111
- Materialize Copy Race Condition by @epps in #1113
- Increasing Timeout For Long Running Jobs by @ahmadnazeri in #1118
New Contributors
- @jerempy made their first contribution in #1074
- @joshcolts18 made their first contribution in #1090
- @syedzubeen made their first contribution in #1119
- @ihkap11 made their first contribution in #1149
Full Changelog: v0.10.3...v0.11.0
v0.10.1
What's Changed
- Hotfix/pinecone casting by @sdreyer in #917
- Bump grpcio from 1.51.1 to 1.53.0 by @dependabot in #908
- Unit tests to cover the SQL format function by @anthonylasso in #918
- Fix table row sizing by @anthonylasso in #937
- Update CLI version output by @anthonylasso in #934
pytest
Coverage by @epps in #940- Bugfix: KCF doesn't read csv files feature registered on source by @aolfat in #943
- Update README.md by @sdreyer in #946
- Run coverage on main by @sdreyer in #950
- Setup code to enable unit-level metadata_server + provider tests. by @anthonylasso in #944
- Provider Config Testing #64 by @anthonylasso in #948
- KCF dockerfile image to 3.10 + use consistent dill versioning by @aolfat in #955
- "Get" Provider Testing by @anthonylasso in #954
- Dashboard metadata variant bugfix by @anthonylasso in #933
- Change Resource names to ResourceVariant by @aolfat in #941
- Register instantiate tests by @anthonylasso in #961
- Bugfix: Dataframe transformation by @anthonylasso in #926
- Spark Tests by @sdreyer in #958
- Tests/spark by @sdreyer in #965
- FEATURE: PostgreSQL: adding support for ssl mode by @ahmadnazeri in #964
- Fix ETCD issue #945 by @anthonylasso in #967
- Cloud Storage Pathing (Azure Blob Storage) by @epps in #947
Full Changelog: v0.10.0...v0.10.1
v0.10.0
What's Changed
V0.10 release brings:
- A brand-new Dashboard UI and enhanced functionality
- Vector database support in local and hosted mode for Weaviate and Pinecone
- API improvements for data science development
- Updated documentation and bugfixes
Dashboard Makeover and Upgrades
We're excited to bring you a more visually appealing Dashboard with new functionality for both users and administrators, including metadata management for resource tags, previewing transformation results, and clear visibility of transformation logic
Assign tags to resources directly through the dashboard UI
Edit Resource Metadata from the Dashboard
Preview datasets directly from the dashboard
Better formatting for Python and SQL transformations
Vector Database Support
You can now register Weaviate and Pinecone as providers!
API Improvements for Data Science Development
Read all files from a directory into a dataframe with ff.register_directory()
Inference Stores are now optional in Local Mode: if an inference store is not specified, it will default to local mode.
Bug Fixes
- Bugfix: Added error check for inputs list by @sdreyer in #878
- Bugfix: Fixed how state is cleared by @ahmadnazeri in #876
- Bugfix: Fixed lingering default variants by @ahmadnazeri in #879
- Bugfix: Allowed nearest() to accept a Feature object by @ahmadnazeri in #885
- Bugfix: Banner Color @RedLeader16 in #873
- Bugfix: Give meaningful error when resource not found by @ahmadnazeri in #883
- Bugfix: Fixed scheduling for KCF by @aolfat in #853
- Bugfix: Fixed missing weaviate provider config implementation by @epps in #899
- Bugfix: Removes Outdated logging package by @sdreyer in #913
Full Changelog: v0.9.0...v0.10.0
v0.9.0
What's New
Vector Database and Embedding Support
You can use Featureform to define and orchestrate data pipelines that generate embeddings. Featureform can write them into either Redis for nearest neighbor lookup. This also allows users to version, re-use, and manage embeddings declaratively.
Registering Redis for use as a Vector Store (it’s the same as registering it typically)
ff.register_redis(
name = "redis",
description = "Example inference store",
team = "Featureform",
host = "0.0.0.0",
port = 6379,
)
A Pipeline to Generate Embeddings from Text
docs = spark.register_file(...)
@spark.df_transform(
inputs=[docs],
)
def embed_docs():
docs[“embedding”] = docs[“text”].map(lambda txt: openai.Embedding.create(
model="text-embedding-ada-002",
input=txt,
)["data"]
return docs
Defining and Versioning an Embedding
@ff.entity
def Article:
embedding = ff.Embedding(embed_docs[[“id”, “embedding”]], dims=1024, vector_db=redis)
@ff.entity
class Article:
embedding = ff.Embedding(
embed_docs[["id", "embedding"]],
dims=1024,
variant="test-variant",
vector_db=redis,
)
Performing a Nearest Neighbor Lookup
client.Nearest(Article.embedding, “id_123”, 25)
Interact with Training Sets as Dataframes
You can already interact with sources as dataframes, this release adds the same functionality to training sets as well.
Interacting with a training set as Pandas
import featureform as ff
client = ff.Client(...)
df = client.training_set(“fraud”, “simple”).dataframe()
print(df.head())
Enhanced Scheduling across Offline Stores
Featureform supports Cron syntax for scheduling transformations to run. This release rebuffs this functionality to make it more stable and efficient, and also adds more verbose error messages.
A transformation that runs every hour on Snowflake
@snowflake.sql_transform(schedule=“0 * * * *”)
def avg_transaction_price()
return “SELECT user, AVG(price) FROM {{transaction}} GROUP BY user”
Run Pandas Transformations on K8s with S3
Featureform schedules and runs your transformations for you. We support running Pandas directly, Featureform spins up a Kubernetes job to run it. This isn’t a replacement for distributed processing frameworks like Spark (which we also support), but it’s a great option for teams that are already using Pandas for production.
Defining our Pandas on Kubernetes Provider
aws_creds = ff.AWSCredentials(
aws_access_key_id="<aws_access_key_id>",
aws_secret_access_key="<aws_secret_access_key>",
)
s3 = ff.register_s3(
name="s3",
credentials=aws_creds,
bucket_path="<s3_bucket_path>",
bucket_region="<s3_bucket_region>"
)
pandas_k8s = ff.register_k8s(
name="k8s",
description="Native featureform kubernetes compute",
store=s3,
team="featureform-team"
)
Registering a file in S3 and a Transformation on it
src = pandas_k8s.register_file(...)
@pandas_k8s.df_transform(inputs=[src])
def transform(src):
return src.groupby("CustomerID")["TransactionAmount"].mean()
v0.8.1
What's Changed
New Functionality
- KCF/S3 Support by @ahmadnazeri in #786
Enhancements
- Updated Readme example to fix serving and use class api by @ahmadnazeri in #792
- Dashboard Routing and Build Optimizations by @RedLeader16 in #781
- Set Jobs Limit for scheduling by @aolfat in #794
- Reformat and cleanup status displayer by @aolfat in #782
- Bump pymdown-extensions from 9.9.2 to 10.0 by @dependabot in #804
Bug Fixes
- Bad pathing exception #769 by @RedLeader16 in #773
- Throw error if input tuple is not of type (str, str) by @ahmadnazeri in #780
- Fix issue with paths for the Spark files by @ahmadnazeri in #776
- Fix missing executor type in differing fields check for SparkProvider by @zhilingc in #789
- Add default username and password for etcd coordinator by @aolfat in #798
New Contributors
Full Changelog: v0.8.0...v0.8.1
v0.8.0
What's Changed
- Spark Enhancement: Yarn Support
- Pull source and transformation data to client
client = Client() # presumes $FEATUREFORM_HOST is set
client.apply(insecure=False) # `insecure=True` for Docker (Quickstart only)
# Primary source as a dataframe
transactions_df = client.dataframe(
transactions, limit=2
) # Using the ColumnSourceRegistrar instance directly with a limit of 2 rows
# SQL transformation source as dataframe
avg_user_transaction_df = client.dataframe(
"average_user_transaction", "quickstart"
) # Using the source name and variant without a limit, which fetches all rows
print(transactions_df.head())
"""
"transactionid" "customerid" "customerdob" "custlocation" "custaccountbalance" "transactionamount" "timestamp" "isfraud"
0 T1 C5841053 10/1/94 JAMSHEDPUR 17819.05 25.0 2022-04-09T11:33:09Z False
1 T2 C2142763 4/4/57 JHAJJAR 2270.69 27999.0 2022-03-27T01:04:21Z False
"""
- Added Ecommerce notebooks for Azure, AWS, GCP
- Docs: Updated custom resource docs and added docs for KCF
- Bugfix: Updated, more useful error messages
- Bugfix: Fixed resource search
- Bugfix: Fixed breadcrumb type and case error
- Bugfix: KCF Resource limits
- Bugfix: Fixed path for docker file and spark
- Bugfix: Dashboard routing and reload fix
- Bugfix: Spark databricks error message
Full Changelog: v0.7.3...v0.8.0
v0.7.3
What's Changed
- Class API Enhancement: Optional
timestamp_column
when registering features/labels - Docs: Update AWS Deployment to Cover changes to
eksctl
- Python 3.11.2 Support
- Bugfix: Resource Status in CLI List Command
- Bugfix: Fixing spark issue with Spark chained transformation
- Bugfix: Issue with not allowing Python objects as input to DF Transformation
- Bugfix: Checks existence of training set features prior to creation
- Adding notebook links to docs
New Contributors
Full Changelog: v0.7.2...v0.7.3