Skip to content

Releases: bentoml/BentoML

BentoML - v1.0.18

14 Apr 10:59
52f7863
Compare
Choose a tag to compare

🍱 BentoML v1.0.18 brings a new way of creating the server and client natively from Python.

  • Start an HTTP or gRPC server and client asynchronously with a context manager.

    server = HTTPServer("iris_classifier:latest", production=True, port=3000)
    
    # Start the server in a separate process and connect to it using a client
    with server.start() as client:
        res = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
  • Start an HTTP or gRPC server synchronously.

    server = HTTPServer("iris_classifier:latest", production=True, port=3000)
    server.start(blocking=True)
  • As always, a client can be created and connected to an running server.

    client = Client.from_url("http://localhost:3000")
    res = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

What's Changed

  • chore(deps): bump coverage[toml] from 7.2.2 to 7.2.3 by @dependabot in #3746
  • bugs: Fix an f-string bug in Tranformers framework. by @ssheng in #3753
  • chore(deps): bump pytest from 7.2.2 to 7.3.0 by @dependabot in #3751
  • chore(deps): bump bufbuild/buf-setup-action from 1.16.0 to 1.17.0 by @dependabot in #3750
  • fix: BufferError when pushing model to BentoCloud by @aarnphm in #3737
  • chore: remove codecov dependencies by @aarnphm in #3754
  • feat: implement new serve API by @sauyon in #3696
  • examples: Add a client example to quickstart by @ssheng in #3752

Full Changelog: v1.0.17...v1.0.18

BentoML - v1.0.17

06 Apr 20:55
09cf0f4
Compare
Choose a tag to compare

🍱 We are excited to announce the release of BentoML v1.0.17, which includes support for 🤗 Hugging Face Transformers pre-trained instances. Prior to this release, only pipelines could be saved and loaded using the bentoml.transformers APIs. However, based on the community's demand to work with pre-trained models, tokenizers, preprocessors, etc., without pipelines, we have expanded our capabilities in bentoml.transformers APIs. With this release, all pre-trained instances can be saved and loaded into either built-in Transformers framework runners or custom runners. This update opens up new possibilities for users to work with pre-trained models, and we are thrilled to see what the community will create using this feature. To learn more, visit BentoML Transformers framework documentation.

  • Pre-trained models and instances, such as tokenizers, preprocessors, and feature extractors, can also be saved as standalone models using the bentoml.transformers.save_model API.

    import bentoml
    from transformers import AutoTokenizer
    
    processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
    model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
    vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
    
    bentoml.transformers.save_model("speecht5_tts_processor", processor)
    bentoml.transformers.save_model("speecht5_tts_model", model, signatures={"generate_speech": {"batchable": False}})
    bentoml.transformers.save_model("speecht5_tts_vocoder", vocoder)
  • Pre-trained models and instances can be run either independently as Transformers framework runners or jointly in a custom runner. To use pre-trained models and instances as individual framework runners, simply get the models reference and convert them to runners using the to_runner method.

    import bentoml
    import torch
    
    from bentoml.io import Text, NumpyNdarray
    from datasets import load_dataset
    
    proccessor_runner = bentoml.transformers.get("speecht5_tts_processor").to_runner()
    model_runner = bentoml.transformers.get("speecht5_tts_model").to_runner()
    vocoder_runner = bentoml.transformers.get("speecht5_tts_vocoder").to_runner()
    embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
    speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
    
    svc = bentoml.Service("text2speech", runners=[proccessor_runner, model_runner, vocoder_runner])
    
    @svc.api(input=Text(), output=NumpyNdarray())
    def generate_speech(inp: str):
        inputs = proccessor_runner.run(text=inp, return_tensors="pt")
        speech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings, vocoder=vocoder_runner.run)
        return speech.numpy()
  • To use the pre-trained models and instances together in a custom runner, use the bentoml.transformers.get API to get the models references and load them in a custom runner. The pretrained instances can then be used for inference in the custom runner.

    import bentoml
    import torch
    
    from datasets import load_dataset
    
    processor_ref = bentoml.models.get("speecht5_tts_processor:latest")
    model_ref = bentoml.models.get("speecht5_tts_model:latest")
    vocoder_ref = bentoml.models.get("speecht5_tts_vocoder:latest")
    
    class SpeechT5Runnable(bentoml.Runnable):
    
        def __init__(self):
            self.processor = bentoml.transformers.load_model(processor_ref)
            self.model = bentoml.transformers.load_model(model_ref)
            self.vocoder = bentoml.transformers.load_model(vocoder_ref)
            self.embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
            self.speaker_embeddings = torch.tensor(self.embeddings_dataset[7306]["xvector"]).unsqueeze(0)
    
        @bentoml.Runnable.method(batchable=False)
        def generate_speech(self, inp: str):
            inputs = self.processor(text=inp, return_tensors="pt")
            speech = self.model.generate_speech(inputs["input_ids"], self.speaker_embeddings, vocoder=self.vocoder)
            return speech.numpy()
    
    text2speech_runner = bentoml.Runner(SpeechT5Runnable, name="speecht5_runner", models=[processor_ref, model_ref, vocoder_ref])
    svc = bentoml.Service("talk_gpt", runners=[text2speech_runner])
    
    @svc.api(input=bentoml.io.Text(), output=bentoml.io.NumpyNdarray())
    async def generate_speech(inp: str):
        return await text2speech_runner.generate_speech.async_run(inp)

What's Changed

  • feat(containerize): caching pip/conda installation layers by @smidm in #3673
  • docs(batching): update docs to 503 by @sauyon in #3677
  • chore(deps): bump ruff from 0.0.255 to 0.0.256 by @dependabot in #3676
  • fix(type): annotate PdSeries with pandas-stubs by @aarnphm in #3466
  • chore(dispatcher): refactor out training code by @sauyon in #3663
  • fix: makes containerize for triton examples to all amd64 by @aarnphm in #3678
  • chore(deps): bump coverage[toml] from 7.2.1 to 7.2.2 by @dependabot in #3679
  • revert: "chore(dispatcher): refactor out training code (#3663)" by @sauyon in #3680
  • doc: add more links to Bentoml/examples by @larme in #3631
  • perf: serialization optimization by @larme in #3606
  • examples: Kubeflow by @ssheng in #3656
  • chore(deps): bump pytest-asyncio from 0.20.3 to 0.21.0 by @dependabot in #3688
  • chore(deps): bump ruff from 0.0.256 to 0.0.257 by @dependabot in #3689
  • chore(deps): bump imageio from 2.26.0 to 2.26.1 by @dependabot in #3690
  • chore(deps): bump yamllint from 1.29.0 to 1.30.0 by @dependabot in #3694
  • fix: remove duplicate dependabot check for pip by @aarnphm in #3691
  • chore(deps): bump ruff from 0.0.257 to 0.0.258 by @dependabot in #3699
  • docs: Update the Kubeflow example by @ssheng in #3703
  • chore(deps): bump ruff from 0.0.258 to 0.0.259 by @dependabot in #3709
  • docs: add link to pyfilesystem plugins by @sauyon in #3716
  • docs: Kubeflow integration documentation by @ssheng in #3704
  • docs: replace load_runner() to get().to_runner() by @KimSoungRyoul in #3715
  • chore(deps): bump imageio from 2.26.1 to 2.27.0 by @dependabot in #3720
  • fix(readme): format markdown table by @aarnphm in #3722
  • fix: copy files before running setup_script by @aarnphm in #3713
  • chore: remove experimental warning for bentoml.metrics by @aarnphm in #3725
  • ci: temporary disable coverage by @aarnphm in #3726
  • chore(deps): bump ruff from 0.0.259 to 0.0.260 by @dependabot in #3734
  • chore(deps): bump tritonclient[all] from 2.31.0 to 2.32.0 by @dependabot in #3730
  • fix(type): bentoml.container.build should accept multiple image_tag by @pmayd in #3719
  • chore(deps): bump bufbuild/buf-setup-action from 1.15.1 to 1.16.0 by @dependabot in #3738
  • feat: add query params to request context by @sauyon in #3717
  • chore(dispatcher): use attr class instead of a tuple by @sauyon in #3731
  • fix: Make it so the configured max_batch_size is respected when batching inference requests together by @RShang97 in #3741
  • feat(transformers): pretrained protocol support by @aarnphm in #3684
  • fix(tests): broken CI by @aarnphm in #3742
  • chore(deps): bump ruff from 0.0.260 to 0.0.261 by @dependabot in #3744
  • docs: Transformers documentation on pre-trained instances support by @ssheng in #3745

New Contributors

Full Changelog: v1.0.16...v1.0.17

BentoML - v1.0.16

14 Mar 21:03
f503a68
Compare
Choose a tag to compare

🍱 BentoML v1.0.16 release is here featuring the introduction of the bentoml.triton framework. With this integration, BentoML now supports running NVIDIA Triton Inference Server as a Runner. See Triton Inference Server documentation to learn more!

  • Triton Inference Server can be configured as a Runner in BentoML with its model repository and CLI arguments specified as parameters.

    import bentoml
    
    triton_runner = bentoml.triton.Runner(
    	"triton_runner",
    	model_repository="s3://bucket/path/to/model_repository",
    	cli_args=["--load-model=torchscrip_yolov5s", "--model-control-mode=explicit"],
    )
  • Models served by the Triton Inference Server Runner can be called as a method on the runner handle both synchronously and asynchronously.

    @svc.api(
        input=bentoml.io.Image.from_sample("./data/0.png"), output=bentoml.io.NumpyNdarray()
    )
    async def bentoml_torchscript_mnist_infer(im: Image) -> NDArray[t.Any]:
        arr = np.array(im) / 255.0
        arr = np.expand_dims(arr, (0, 1)).astype("float32")
        InferResult = await triton_runner.torchscript_mnist.async_run(arr)
        return InferResult.as_numpy("OUTPUT__0")
  • Build bentos and containerize images with Triton Runners by specifying nvcr.io/nvidia/tritonserver base image in bentofile.yaml.

    service: service:svc
    include:
      - /model_repository
      - /data/*.png
      - /*.py
    exclude:
      - /__pycache__
      - /venv
      - /train.py
      - /build_bento.py
      - /containerize_bento.py
    python:
      packages:
        - bentoml[triton]
    docker:
      base_image: nvcr.io/nvidia/tritonserver:22.12-py3

💡 If you are an existing Triton user, the integration provides simpler ways to add custom logics in Python, deploy distributed multi-model inference graph, unify model management across different ML frameworks and workflows, and standardize model packaging format with versioning and collaboration features. If you are an existing BentoML user, the integration improves the runner efficiency and throughput under high load thanks to Triton’s efficient C++ runtime.

What's Changed

New Contributors

Full Changelog: v1.0.15...v1.0.16

BentoML - v1.0.15

16 Feb 01:31
a61379a
Compare
Choose a tag to compare

🍱 BentoML v1.0.15 release is here featuring the introduction of the bentoml.diffusers framework.

  • Learn more about the capabilities of the bentoml.diffusers framework in the Creating Stable Diffusion 2.0 Service With BentoML And Diffusers blog and BentoML Diffusers example project.

  • Import a diffusion model with the bentoml.diffusers.import_model API.

    import bentoml
    
    bentoml.diffusers.import_model(
        "sd2",
        "stabilityai/stable-diffusion-2",
    )
  • Create a text2img service using a Stable Diffusion 2.0 model runner with the familiar to_runner API from the bentoml.diffuser framework.

    import torch
    from diffusers import StableDiffusionPipeline
    
    import bentoml
    from bentoml.io import Image, JSON, Multipart
    
    bento_model = bentoml.diffusers.get("sd2:latest")
    stable_diffusion_runner = bento_model.to_runner()
    
    svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])
    
    @svc.api(input=JSON(), output=Image())
    def txt2img(input_data):
        images, _ = stable_diffusion_runner.run(**input_data)
        return images[0]

🍱 Fixed a incompatibility change introduced in starlette==0.25.0 result in the type MultiPartMessage not being found in starlette.formparsers.

ImportError: cannot import name 'MultiPartMessage' from 'starlette.formparsers' (/opt/miniconda3/envs/bentoml/lib/python3.10/site-packages/starlette/formparsers.py)

What's Changed

New Contributors

Full Changelog: v1.0.14...v1.0.15

BentoML - v1.0.14

08 Feb 22:41
9a6dc93
Compare
Choose a tag to compare

🍱 Fixed the backward incompatibility introduced in starlette version 0.24.0. Upgrade BentoML to v1.0.14 if you encounter the error related to content_type like below.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/service_app.py", line 305, in api_func
    input_data = await api.input.from_http_request(request)
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/io_descriptors/multipart.py", line 208, in from_http_request
    reqs = await populate_multipart_requests(request)
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/utils/formparser.py", line 188, in populate_multipart_requests
    form = await multipart_parser.parse()
  File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/utils/formparser.py", line 158, in parse
    multipart_file = UploadFile(
TypeError: __init__() got an unexpected keyword argument 'content_type'

BentoML - v1.0.13

20 Jan 03:52
4d2fd62
Compare
Choose a tag to compare

🍱 BentoML v1.0.13 is released featuring a preview of batch inference with Spark.

  • Run the batch inference job using the bentoml.batch.run_in_spark() method. This method takes the API name, the Spark DataFrame containing the input data, and the Spark session itself as parameters, and it returns a DataFrame containing the results of the batch inference job.

    import bentoml
    
    # Import the bento from a repository or get the bento from the bento store
    bento = bentoml.import_bento("s3://bentoml/quickstart")
    
    # Run the run_in_spark function with the bento, API name, and Spark session
    results_df = bentoml.batch.run_in_spark(bento, "classify", df, spark)
  • Internally, what happens when you run run_in_spark is as follows:

    • First, the bento is distributed to the cluster. Note that if the bento has already been distributed, i.e. you have already run a computation with that bento, this step is skipped.
    • Next, a process function is created, which starts a BentoML server on each of the Spark workers, then uses a client to process all the data. This is done so that the workers take advantage of the batch processing features of the BentoML server. PySpark pickles this process function and dispatches it, along with the relevant data, to the workers.
    • Finally, the function is evaluated on the given dataframe. Once all methods that the user defined in the script have been executed, the data is returned to the master node.

⚠️ The bentoml.batch API may undergo incompatible changes until general availability announced in a later minor version release.
🥂 Shout out to jeffthebear, KimSoungRyoul, Robert Fernandez, Marco Vela, Quan Nguyen, and y1450 from the community for their contributions in this release.

What's Changed

New Contributors

Full Changelog: v1.0.12...v1.0.13

BentoML - v1.0.12

08 Dec 10:24
b6a4158
Compare
Choose a tag to compare

Important bug fixes.

  • Fixed runner call failures with keyword arguments.
  • Fixed incorrect user base image override .

What's Changed

Full Changelog: v1.0.11...v1.0.12

BentoML - v1.0.11

07 Dec 20:30
cc38007
Compare
Choose a tag to compare

🍱 BentoML v1.0.11 is here featuring the introduction of an inference collection and model monitoring API that can be easily integrated with any model monitoring frameworks.

image

  • Introduced the bentoml.monitor API for monitoring any features, predictions, and target data in numerical, categorical, and numerical sequence types.

    import bentoml
    from bentoml.io import Text
    from bentoml.io import NumpyNdarray
    
    CLASS_NAMES = ["setosa", "versicolor", "virginica"]
    
    iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
    svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
    
    @svc.api(
        input=NumpyNdarray.from_sample(np.array([4.9, 3.0, 1.4, 0.2], dtype=np.double)),
        output=Text(),
    )
    async def classify(features: np.ndarray) -> str:
        with bentoml.monitor("iris_classifier_prediction") as mon:
            mon.log(features[0], name="sepal length", role="feature", data_type="numerical")
            mon.log(features[1], name="sepal width", role="feature", data_type="numerical")
            mon.log(features[2], name="petal length", role="feature", data_type="numerical")
            mon.log(features[3], name="petal width", role="feature", data_type="numerical")
    
            results = await iris_clf_runner.predict.async_run([features])
            result = results[0]
            category = CLASS_NAMES[result]
    
            mon.log(category, name="pred", role="prediction", data_type="categorical")
        return category
  • Enabled monitoring data collection through log file forwarding using any forwarders (fluentbit, filebeat, logstash) or OTLP exporter implementations.

    • Configuration for monitoring data collection through log files.

      monitoring:
        enabled: true
        type: default
        options:
          log_path: path/to/log/file
    • Configuration for monitoring data collection through an OTLP exporter.

      monitoring:
        enable: true
        type: otlp
        options:
          endpoint: http://localhost:5000
          insecure: true
          credentials: null
          headers: null
          timeout: 10
          compression: null
          meta_sample_rate: 1.0
  • Supported third-party monitoring data collector integrations through BentoML Plugins. See bentoml/plugins repository for more details.

🐳 Improved containerization SDK and CLI options, read more in #3164.

  • Added support for multiple backend builder options (Docker, nerdctl, Podman, Buildah, Buildx) in addition to buildctl (standalone buildkit builder).

  • Improved Python SDK for containerization with different backend builder options.

    import bentoml
    
    bentoml.container.build("iris_classifier:latest", backend="podman", features=["grpc","grpc-reflection"], **kwargs)
  • Improved CLI to include the newly added options.

    bentoml containerize --help
  • Standardized the generated Dockerfile in bentos to be compatible with all build tools for use cases that require building from a Dockerfile directly.

💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.

What's Changed

  • chore: add framework utils functions directory by @larme in #3203
  • fix: missing f-string in tag validation error message by @csh3695 in #3205
  • chore(build_config): bypass exception when cuda and conda is specified by @aarnphm in #3188
  • docs: Update asynchronous API documentation by @ssheng in #3204
  • style: use relative import inside _internal/ by @larme in #3209
  • style: fix monitoring type error by @aarnphm in #3208
  • chore(build): add dependabot for pyproject.toml by @aarnphm in #3139
  • chore(deps): bump black[jupyter] from 22.8.0 to 22.10.0 in /requirements by @dependabot in #3217
  • chore(deps): bump pylint from 2.15.3 to 2.15.5 in /requirements by @dependabot in #3212
  • chore(deps): bump pytest-asyncio from 0.19.0 to 0.20.1 in /requirements by @dependabot in #3216
  • chore(deps): bump imageio from 2.22.1 to 2.22.4 in /requirements by @dependabot in #3211
  • fix: don't index ContextVar at runtime by @sauyon in #3221
  • chore(deps): bump pyarrow from 9.0.0 to 10.0.0 in /requirements by @dependabot in #3214
  • chore: configuration check for development by @aarnphm in #3223
  • fix bento create by @quandollar in #3220
  • fix(docs): missing table tag by @nyongja in #3231
  • docs: grammar corrections by @tbazin in #3234
  • chore(deps): bump pytest-asyncio from 0.20.1 to 0.20.2 in /requirements by @dependabot in #3238
  • chore(deps): bump pytest-xdist[psutil] from 2.5.0 to 3.0.2 by @dependabot in #3245
  • chore(deps): bump pytest from 7.1.3 to 7.2.0 in /requirements by @dependabot in #3237
  • chore(deps): bump build[virtualenv] from 0.8.0 to 0.9.0 in /requirements by @dependabot in #3240
  • deps: bumping gRPC and OTLP dependencies by @aarnphm in #3228
  • feat(file): support custom mime type for file proto by @aarnphm in #3095
  • fix: multipart for client by @sauyon in #3253
  • fix(json): make sure to parse a list of dict for_sample by @aarnphm in #3229
  • chore: move test proto to internal tests only by @aarnphm in #3255
  • fix(framework): external_modules for loading pytorch by @bojiang in #3254
  • feat(container): builder implementation by @aarnphm in #3164
  • feat(sdk): implement otlp monitoring exporter by @bojiang in #3257
  • chore(grpc): add missing init.py by @aarnphm in #3259
  • docs(metrics): Update docs for the default metrics by @ssheng in #3262
  • chore: generate plain dockerfile without buildkit syntax by @aarnphm in #3261
  • style: remove # type: ignore by @aarnphm in #3265
  • fix: lazy load ONNX utils by @aarnphm in #3266
  • fix(pytorch): pickle is the unpickler of cloudpickle by @bojiang in #3269
  • fix: instructions for missing sklearn dependency by @benjamintanweihao in #3271
  • docs: ONNX signature docs by @larme in #3272
  • chore(deps): bump pyarrow from 10.0.0 to 10.0.1 by @dependabot in #3273
  • chore(deps): bump pylint from 2.15.5 to 2.15.6 by @dependabot in #3274
  • fix(pandas): only set columns when apply_column_names is set by @mqk in #3275
  • feat: configuration versioning by @aarnphm in #3052
  • fix(container): support comma in docker env by @larme in #3285
  • chore(stub): import filetype by @aarnphm in #3260
  • fix(container): ensure to stream logs when DOCKER_BUILDKIT=0 by @aarnphm in #3294
  • docs: update instructions for containerize message by @aarnphm in #3289
  • fix: unset NVIDIA_VISIBLE_DEVICES when cuda image is used by @aarnphm in #3298
  • fix: multipart logic by @sauyon in #3297
  • chore(deps): bump pylint from 2.15.6 to 2.15.7 by @dependabot in #3291
  • docs: wrong arguments when saving by @KimSoungRyoul in #3306
  • chore(deps): bump pylint from 2.15.7 to 2.15.8 in /requirements by @dependabot in #3308
  • chore(deps): bump pytest-xdist[psutil] from 3.0.2 to 3.1.0 in /requirements by @dependabot in #3309
  • chore(pyproject): bumping python version typeshed to 3.11 by @aarnphm in https://github.com/bentoml/Bento...
Read more

BentoML - v1.0.10

09 Nov 02:33
248979b
Compare
Choose a tag to compare

🍱 BentoML v1.0.10 is released to address a recurring broken pipe reported by the community. Also included in this release, is a list of improvements we’d like to share with the community.

  • Fixed an aiohttp.client_exceptions.ClientOSError caused by asymmetrical keep alive timeout settings between the API Server and Runner.

    aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe
  • Added multi-output support for ONNX and TensorFlow frameworks.

  • Added from_sample support to all IO Descriptors in addition to just bentoml.io.NumpyNdarray and the sample is reflected in the Swagger UI.

    # Pandas Example
    @svc.api(
        input=PandasDataFrame.from_sample(
            pd.DataFrame([1,2,3,4])
        ),
    	output=PandasDataFrame(),
    )
    
    # JSON Example
    @svc.api(
        input=JSON.from_sample(
            {"foo": 1, "bar": 2}
        ),
        output=JSON(),
    )

    image

💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.

What's Changed

New Contributors

Full Changelog: v1.0.8...v1.0.9

What's Changed

Full Changelog: v1.0.9...v1.0.10

BentoML - v1.0.8

01 Nov 00:43
8365375
Compare
Choose a tag to compare

🍱 BentoML v1.0.8 is released with a list of improvement we hope that you’ll find useful.

  • Introduced Bento Client for easy access to the BentoML service over HTTP. Both sync and async calls are supported. See the Bento Client Guide for more details.

    from bentoml.client import Client
    
    client = Client.from_url("http://localhost:3000")
    
    # Sync call
    response = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
    
    # Async call
    response = await client.async_classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
  • Introduced custom metrics support for easy instrumentation of custom metrics over Prometheus. See Metrics Guide for more details.

    # Histogram metric
    inference_duration = bentoml.metrics.Histogram(
        name="inference_duration",
        documentation="Duration of inference",
        labelnames=["nltk_version", "sentiment_cls"],
    )
    
    # Counter metric
    polarity_counter = bentoml.metrics.Counter(
        name="polarity_total",
        documentation="Count total number of analysis by polarity scores",
        labelnames=["polarity"],
    )

    Full Prometheus style syntax is supported for instrumenting custom metrics inside API and Runner definitions.

    # Histogram
    inference_duration.labels(
        nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__
    ).observe(time.perf_counter() - start)
    
    # Counter
    polarity_counter.labels(polarity=is_positive).inc()
  • Improved health checking to also cover the status of runners to avoid returning a healthy status before runners are ready.

  • Added SSL/TLS support to gRPC serving.

    bentoml serve-grpc --ssl-certfile=credentials/cert.pem --ssl-keyfile=credentials/key.pem --production --enable-reflection
  • Added channelz support for easy debugging gRPC serving.

  • Allowed nested requirements with the -r syntax.

    # requirements.txt
    -r nested/requirements.txt
    
    pydantic
    Pillow
    fastapi
  • Improved the adaptive batching dispatcher auto-tuning ability to avoid sporadic request failures due to batching in the beginning of the runner lifecycle.

  • Fixed a bug such that runners will raise a TypeError when overloaded. Now an HTTP 503 Service Unavailable will be returned when runner is overloaded.

    File "python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 188, in async_run_method
        return tuple(AutoContainer.from_payload(payload) for payload in payloads)
    TypeError: 'Response' object is not iterable

💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.

🥂 We’d like to thank the community for your continued support and engagement.