Releases · bentoml/BentoML

12 Oct 18:24

aarnphm

v1.1.7

1e8902a

BentoML - v1.1.7

What's Changed

Update OTEL deps to 0.41b0 to address CVE for 0.39b0

General documentation client updates.

docs: Add the SDXL deployment quickstart by @Sherlock113 in #4175
Update pytorch.rst by @piercus in #4176
chore(deps): bump actions/checkout from 3 to 4 by @dependabot in #4177
fix: parse tag from multiline output by @frostming in #4178
docs: Update the user management docs by @Sherlock113 in #4186
fix(config): set default runner timeout to 15min by @sauyon in #4184
docs: Add observability to the BentoCloud overview docs by @Sherlock113 in #4187
fix(framework): add args and kwargs to sklearn and xgboost methods by @jianshen92 in #4189
docs: fix typo in bento.rst and model.rst by @seedspirit in #4192
fix: Rename ASGIHTTPSender to BufferedASGISender for Ray compatibility. by @HamzaFarhan in #4191
fix(client): make get_client raise instead of logging by @sauyon in #4181
fix(cloud-client): delete unused field of schema by @Haivilo in #4196
chore(deps): bump docker/setup-buildx-action from 2 to 3 by @dependabot in #4195
chore(deps): bump docker/setup-qemu-action from 2 to 3 by @dependabot in #4194
chore: client_request_hook type fix by @sauyon in #4199
docs: Add docs for the new bentoml.Server API by @Sherlock113 in #4198
docs: Add the OneDiffusion Google Colab task by @Sherlock113 in #4202
docs: Add best practices doc for cost optimization by @Sherlock113 in #4200
docs: Update the Manage Models and Bentos docs by @Sherlock113 in #4203
fix: do not use UDS on WSL by @frostming in #4204
docs: fix typos in help messages by @smidm in #4206
fix: subprocess not using same python as main process causing bentoml.bentos.build to crash by @nickolasrm in #4209
fix: allow WSL in the condition by @frostming in #4210
docs: Update manage access token docs by @Sherlock113 in #4215
ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4216
fix: EasyOCR integration docs mistake by @jianshen92 in #4214
fix: include mounted FastAPI app's OpenAPI components by @RobbieFernandez in #4212
UPDATE: model.py -> fix Model class Exepction message. by @JminJ in #4219
docs: Remove private access mention by @Sherlock113 in #4221
docs: Change to sentence case by @Sherlock113 in #4222
docs: Fix dead link by @Sherlock113 in #4225
feat: support ipv6 addresses for serve by @sauyon in #3914
docs: Fix all dead links in BentoML docs by @Sherlock113 in #4229
docs: Add the BYOC doc by @Sherlock113 in #4223
docs: Update the Services doc by @Sherlock113 in #4231
fix(client): type fixes by @sauyon in #4182
fix: correct the bento size to include the size of models by @frostming in #4226
fix: use httpx for usage tracking by @sauyon in #4228
fix(deps): bump otel for CVE by @aarnphm in #4233
feat: separate and optimize async and sync clients by @judahrand in #4116

New Contributors

@piercus made their first contribution in #4176
@seedspirit made their first contribution in #4192
@HamzaFarhan made their first contribution in #4191
@nickolasrm made their first contribution in #4209
@JminJ made their first contribution in #4219

Full Changelog: v1.1.6...v1.1.7

Contributors

piercus, smidm, and 14 other contributors

Assets 2

08 Sep 05:23

ssheng

v1.1.6

c1504bd

BentoML - v1.1.6

What's Changed

fix(exception): catch exception for users' runners code by @aarnphm in #4150
docs: Add the streaming docs by @Sherlock113 in #4164
ci: pre-commit autoupdate [skip ci] by @pre-commit-ci in #4167
fix(httpclient): take into account trailing slash in from_url by @sauyon in #4169
docs: fix typo by @Sherlock113 in #4173
fix: apply env map for distributed runner workers by @bojiang in #4174

New Contributors

@pre-commit-ci made their first contribution in #4167

Full Changelog: v1.1.5...v1.1.6

Contributors

sauyon, bojiang, and 3 other contributors

Assets 2

08 Sep 05:15

ssheng

v1.1.5

ca6eca5

BentoML - v1.1.5

What's Changed

fix(type): explicit init for attrs Runner by @aarnphm in #4140
fix: typo in ALLOWED_CUDA_VERSION_ARGS by @thomasjo in #4156
chore(deps): open Starlette version, to allow latest by @alexeyshockov in #4100
chore: lower bound for cloudpickle by @aarnphm in #4098
docs: Add embedded runners docs by @Sherlock113 in #4157
fix cloud client types by @sauyon in #4160
fix: use closer-integrated callbackwrapper by @sauyon in #4161
chore(annotations): cleanup compat and fix ModelSignatureDict type by @aarnphm in #4162
fix(pull): correct use cloud_context for models pull by @aarnphm in #4163

New Contributors

@thomasjo made their first contribution in #4156
@alexeyshockov made their first contribution in #4100

Full Changelog: v1.1.4...v1.1.5

Contributors

thomasjo, alexeyshockov, and 3 other contributors

Assets 2

30 Aug 01:17

aarnphm

v1.1.4

7a83d99

BentoML - v1.1.4

🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML v1.14 and OpenLLM v0.2.27. See an example service definition for SSE streaming with Llama2.

Added response streaming through SSE to the bentoml.io.Text IO Descriptor type.
Added async generator support to both API Server and Runner to yield incremental text responses.
Added supported to ☁️ BentoCloud to natively support SSE streaming.

🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.

Added /v1/generate_stream endpoint for streaming responses from LLMs.

curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
  "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:",
  "llm_config": {
    "use_llama2_prompt": false,
    "max_new_tokens": 4096,
    "early_stopping": false,
    "num_beams": 1,
    "num_beam_groups": 1,
    "use_cache": true,
    "temperature": 0.89,
    "top_k": 50,
    "top_p": 0.76,
    "typical_p": 1,
    "epsilon_cutoff": 0,
    "eta_cutoff": 0,
    "diversity_penalty": 0,
    "repetition_penalty": 1,
    "encoder_repetition_penalty": 1,
    "length_penalty": 1,
    "no_repeat_ngram_size": 0,
    "renormalize_logits": false,
    "remove_invalid_values": false,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "encoder_no_repeat_ngram_size": 0,
    "n": 1,
    "best_of": 1,
    "presence_penalty": 0.5,
    "frequency_penalty": 0,
    "use_beam_search": false,
    "ignore_eos": false
  },
  "adapter_name": null
}'

What's Changed

docs: Update the models doc by @Sherlock113 in #4145
docs: Add more workflows to the GitHub Actions doc by @Sherlock113 in #4146
docs: Add text embedding example to readme by @Sherlock113 in #4151
fix: bento build cache miss by @xianml in #4153
fix(buildx): parsing attestation on docker desktop by @aarnphm in #4155

New Contributors

@xianml made their first contribution in #4153

Full Changelog: v1.1.3...v1.1.4

Contributors

aarnphm, Sherlock113, and xianml

Assets 2

22 Aug 02:46

aarnphm

v1.1.2

a2ead21

BentoML - v1.1.2

Patch releases

BentoML now provides a new diffusers integration, bentoml.diffusers_simple.

This introduces two integration for stable_diffusion and stable_diffusion_xl model.

import bentoml

# Create a Runner for a Stable Diffusion model
runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4")

# Create a Runner for a Stable Diffusion XL model
runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")

General bug fixes and documentation improvement

What's Changed

docs: Add the Overview and Quickstarts sections by @Sherlock113 in #4088
chore(type): makes ModelInfo mypy-compatible by @aarnphm in #4094
feat(store): update annotations by @aarnphm in #4092
docs: Fix some relative links by @Sherlock113 in #4097
docs: Add the Iris quickstart doc by @Sherlock113 in #4096
docs: Add the yolo quickstart by @Sherlock113 in #4099
docs: Code format fix by @Sherlock113 in #4101
fix: respect environment during bentoml.bentos.build by @aarnphm in #4081
docs: replaced deprecated save to save_model in pytorch.rst by @EgShes in #4102
fix: Make the install command shorter by @frostming in #4103
docs: Update the BentoCloud Build doc by @Sherlock113 in #4104
docs: Add quickstart repo link and move torch import in Yolo by @Sherlock113 in #4106
docs: fix typo by @zhangwm404 in #4108
docs: fix typo by @zhangwm404 in #4109
fix: calculate Pandas DataFrame batch size correctly by @judahrand in #4110
fix(cli): fix CLI output to BentoCloud by @Haivilo in #4114
Fix sklearn example docs by @jianshen92 in #4121
docs: Add the BentoCloud Deployment creation and update page property explanations by @Sherlock113 in #4105
fix: disable pyright for being too strict by @frostming in #4113
refactor(cli): change prompt of cloud cli to unify Yatai and BentoCloud by @Haivilo in #4124
fix(cli): change model to lower case by @Haivilo in #4126
chore(ci): remove codestyle jobs by @aarnphm in #4125
fix: don't pass column names twice by @judahrand in #4120
feat: SSE (Experimental) by @jianshen92 in #4083
docs: Restructure the get started section in BentoCloud docs by @Sherlock113 in #4129
docs: change monitoring image by @Haivilo in #4133
feat: Rust gRPC client by @aarnphm in #3368
feature(framework): diffusers lora and textual inversion support by @larme in #4086
feat(buildx): support for attestation and sbom with buildx by @aarnphm in #4132

New Contributors

@EgShes made their first contribution in #4102
@zhangwm404 made their first contribution in #4108

Full Changelog: v1.1.1...v1.1.2

Contributors

larme, zh4n7wm, and 7 other contributors

Assets 2

01 Aug 21:11

aarnphm

v1.1.1

ea4aafc

BentoML - v1.1.1

🍱 Patched release 1.1.1

Added more extensive cloud config option for bentoml deployment CLI, Thanks @Haivilo.
Note that bentoml deployment update now takes the name as a optional positional argument instead of the previous behaviour --name:
```
 bentoml deployment update DEPLOYMENT_NAME
```
See #4087
Added documentation about bento release GitHub action, Thanks @frostming. See #4071

Full Changelog: v1.1.0...v1.1.1

Contributors

frostming and Haivilo

Assets 4

24 Jul 20:34

ssheng

v1.1.0

2ab6de7

BentoML - v1.1.0

🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.

Backward Compatibility: Rest assured that this release maintains full API backward compatibility with v1.0.
Official gRPC Support: We've transitioned gRPC support in BentoML from experimental to official status, expanding your toolkit for high-performance, low-latency services.
Ray Integration: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. BentoML integrates natively with Ray Serve to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
Enhanced Hugging Face Transformers and Diffusers Support: All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the Transformers and Diffusers framework libraries.
Enhanced Model Version Management: Enjoy greater flexibility with the improved model version management, enabling flexible configuration and synchronization of model versions with your remote model store.

🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of Llama 2 models.

GPU and CPU Support: Running Llama is support on both GPU and CPU.

Model variations and parameter sizes: Support all model weights and parameter sizes on Hugging Face.

meta-llama/llama-2-70b-chat-hf
meta-llama/llama-2-13b-chat-hf
meta-llama/llama-2-7b-chat-hf
meta-llama/llama-2-70b-hf
meta-llama/llama-2-13b-hf
meta-llama/llama-2-7b-hf
openlm-research/open_llama_7b_v2
openlm-research/open_llama_3b_v2
openlm-research/open_llama_13b
huggyllama/llama-65b
huggyllama/llama-30b
huggyllama/llama-13b
huggyllama/llama-7b

Users can use any weights on HuggingFace (e.g. TheBloke/Llama-2-13B-chat-GPTQ), custom weights from local path (e.g. /path/to/llama-1), or fine-tuned weights as long as it adheres to LlamaModelForCausalLM.

Stay tuned for Fine-tuning capabilities in OpenLLM: Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.
```
python -m openllm.playground.llama2_qlora --help
```

Assets 4

12 Jun 20:44

ssheng

v1.0.22

89e5fda

BentoML - v1.0.22

🍱 BentoML v1.0.22 release has brought a list of well-anticipated updates.

Added support for Pydantic 2 for better validate performance.
Added support for CUDA 12 versions in builds and containerization.

Introduced service lifecycle events allowing adding custom logic on_deployment, on_startup, and on_shutdown. States can be managed using the context ctx variable during the on_startup and on_shutdown events and during request serving in the API.

@svc.on_deployment
def on_deployment():
  pass

@svc.on_startup
def on_startup(ctx: bentoml.Context):
  ctx.state["object_key"] = create_object()

@svc.on_shutdown
def on_shutdown(ctx: bentoml.Context):
  cleanup_state(ctx.state["object_key"])

@svc.api
def predict(input_data, ctx):
  object = ctx.state["object_key"]
  pass

Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.

api_server:
  traffic:
    timeout: 10 # API Server request timeout in seconds
    max_concurrency: 32 # Maximum concurrency requests in the API Server

runners:
  iris:
    traffic:
      timeout: 10 # Runner request timeout in seconds
      max_concurrency: 32 # Maximum concurrency requests in the Runner

Improved performance of bentoml push performance for large Bentos.

🚀 One more thing, the team is delighted to unveil our latest endeavor, OpenLLM. This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.

Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.
```
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
```
Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.
```
llm_runner = openllm.Runner("dolly-v2")
```
Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.
```
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
```

Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.

Assets 4

10 May 01:14

ssheng

v1.0.20

7f7be71

BentoML - v1.0.20

🍱 BentoML v1.0.20 is released with improved usability and compatibility features.

Production Mode by Default: bentoml serve command will now run with the --production option by default. The change is made the simulate the production behavior during development. The --reload option will continue to with as expected. To achieve the serving behavior previously, use --development instead.
Optional Dependency for OpenTelemetry Exporter: The opentelemetry-exporter-otlp-proto-http dependency has been moved from a required dependency to an optional one to address a protobuf dependency incompatibility issue. ⚠️ If you are currently using the Model Monitoring and Inference Data Collection feature, you must install the package with the monitor-otlp ****option from this release onwards to include the necessary dependency.
```
pip install "bentoml[monitor-otlp]"
```
OpenTelemetry Trace ID Configuration Option: A new configuration option has been added to return the OpenTelemetry Trace ID in the response. This feature is particularly helpful when tracing has not been initialized in the upstream caller, but the caller still wishes to log the Trace ID in case of an error.
```
api_server:
  http:
    response:
      trace_id: True
```

Start from a Service: Added the ability to start a server from a bentoml.Service object. This is helpful for troubleshooting a project in a development environment where no Bentos has been built yet.

import bentoml

# import the Service defined in `/clip_api_service/service.py` file
from clip_api_service.service import svc 

if __name__ == "__main__":
  # start a server:
  server = bentoml.HTTPServer(svc)
  server.start(blocking=False)
  client = server.get_client()
  client.predict(..)

What's Changed

fix(dispatcher): handling empty o_stat in trigger_refresh by @larme in #3796
fix(framework): adjust diffusers device_map default behavior by @larme in #3779
chore(dispatcher): cancel jobs with a for loop by @sauyon in #3788
fix: correctly reraise CancelledError by @sauyon in #3801
use path as resource for non-OS paths by @sauyon in #3800
chore(deps): bump coverage[toml] from 7.2.3 to 7.2.4 by @dependabot in #3803
feat: embedded runner by @larme in #3735
feat(tensorflow): support list types inputs by @enmanuelmag in #3807
chore(deps): bump ruff from 0.0.263 to 0.0.264 by @dependabot in #3817
feat: subprocess build by @aarnphm in #3814
docs: update community slack links by @parano in #3824
chore(deps): bump pyarrow from 11.0.0 to 12.0.0 by @dependabot in #3820
chore(deps): remove imageio by @aarnphm in #3812
chore(deps): bump tritonclient[all] from 2.32.0 to 2.33.0 by @dependabot in #3795
ci: add Pillow to tests dependencies by @aarnphm in #3830
feat(observability): support service.name by @aarnphm in #3825
feat: optional returning trace_id in response by @aarnphm in #3827
chore: 3.11 support by @PeterJCLaw in #3792
fix: Eliminate the exception during shutdown by @frostming in #3826
chore: expose scheduling_strategy in to_runner by @bojiang in #3831
feat: allow starting server with bentoml.Service instance by @parano in #3829
chore(deps): bump bufbuild/buf-setup-action from 1.17.0 to 1.18.0 by @dependabot in #3838
fix: make sure to set content-type for file type by @aarnphm in #3837
docs: update default docs to use env as key:value instead of list type by @aarnphm in #3841
deps: move exporter-proto to optional by @aarnphm in #3840
feat(server): improve server APIs by @aarnphm in #3834

New Contributors

@enmanuelmag made their first contribution in #3807
@PeterJCLaw made their first contribution in #3792

Full Changelog: v1.0.19...v1.0.20

Contributors

larme, PeterJCLaw, and 7 other contributors

Assets 4

26 Apr 23:52

ssheng

v1.0.19

afe9660

BentoML - v1.0.19

🍱 BentoML v1.0.19 is released with enhanced GPU utilization and expanded ML framework support.

Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the workers_per_resource scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance. workers_per_resource is 1 by default.
```
runners:
  iris:
    resources:
      nvidia.com/gpu: 1
    workers_per_resource: 2
```
New ML framework support: We've added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks.
Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency.
Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to v4.18, ensuring a seamless experience for users with older versions.

⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML's cloud-native components. Prior, developers were limited to exporting and deploying Bento
as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.

💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML.

Learn more scheduling strategy to get better resource utilization.
Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework.
Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.

What's Changed

fix(env): using python -m to run pip commands by @frostming in #3762
chore(deps): bump pytest from 7.3.0 to 7.3.1 by @dependabot in #3766
feat: lazy load bentoml.server by @aarnphm in #3763
fix(client): service route prefix by @aarnphm in #3765
chore: add test with many requests by @sauyon in #3768
fix: using http config for grpc server by @aarnphm in #3771
feat: apply pep574 out-of-band pickling to DefaultContainer by @larme in #3736
fix: passing serve_cmd and passthrough kwargs by @aarnphm in #3764
feat: Detectron by @aarnphm in #3711
chore(dispatcher): (re-)factor out training code by @sauyon in #3767
feat: EasyOCR by @aarnphm in #3712
feat(build): support 3.11 by @aarnphm in #3774
patch: backports module availability for transformers<4.18 by @aarnphm in #3775
fix(dispatcher): set wait to 0 while training by @sauyon in #3664
chore(deps): bump ruff from 0.0.261 to 0.0.262 by @dependabot in #3778
feat: add model#load_model method by @parano in #3780
feat: Allow spawning more than 1 worker on each resource by @frostming in #3776
docs: Fix TensorFlow save_model parameter order by @ssheng in #3781
chore(deps): bump yamllint from 1.30.0 to 1.31.0 by @dependabot in #3782
chore(deps): bump imageio from 2.27.0 to 2.28.0 by @dependabot in #3783
chore(deps): bump ruff from 0.0.262 to 0.0.263 by @dependabot in #3790
fix: allow import service defined under a Python package by @parano in #3794

New Contributors

@frostming made their first contribution in #3762

Full Changelog: v1.0.18...v1.0.19

Contributors

larme, parano, and 5 other contributors

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Patch releases

What's Changed

New Contributors

Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: bentoml/BentoML

BentoML - v1.1.7

What's Changed

New Contributors

Contributors

BentoML - v1.1.6

What's Changed

New Contributors

Contributors

BentoML - v1.1.5

What's Changed

New Contributors

Contributors

BentoML - v1.1.4

What's Changed

New Contributors

Contributors

BentoML - v1.1.2

Patch releases

What's Changed

New Contributors

Contributors

BentoML - v1.1.1

Contributors

BentoML - v1.1.0

BentoML - v1.0.22

BentoML - v1.0.20

What's Changed

New Contributors

Contributors

BentoML - v1.0.19

What's Changed

New Contributors

Contributors