Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Dynamic Service creation #4498

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8b167aa
Added add_api functionalty with example | draft
holzweber Jan 30, 2024
a28891d
working but unclean version of dynamic services
holzweber Feb 10, 2024
8299fd5
run pre-commit hook on changed files
holzweber Feb 10, 2024
fda3a5f
Missing checks
holzweber Feb 10, 2024
b346b6c
fix(sdk): current directory for built bentos (#4505)
bojiang Feb 19, 2024
1c52c7b
chore(cloud cli): rename cluster to region (#4508)
bojiang Feb 19, 2024
259383d
doc: Add the lcm lora use case doc (#4510)
Sherlock113 Feb 20, 2024
ad0d485
fix(sdk): clean bentoml version (#4511)
bojiang Feb 20, 2024
6c2ac38
fix: bug: Dataframes not serializing correctly in the new API (#4491)
frostming Feb 20, 2024
89adfb2
docs: Update the get started docs (#4513)
Sherlock113 Feb 20, 2024
da08bfa
fix(sdk): incorrect bento_path if not provided (#4514)
bojiang Feb 20, 2024
71c483f
docs: Add client code examples without context manager (#4512)
Sherlock113 Feb 21, 2024
2ed9108
docs: Update docs (#4515)
Sherlock113 Feb 21, 2024
b1557bf
docs: Add authorization docs (#4517)
Sherlock113 Feb 21, 2024
504ff63
docs: Change sample input to one line (#4518)
Sherlock113 Feb 21, 2024
b64ce64
docs: Update ControlNet use case docs (#4519)
Sherlock113 Feb 22, 2024
ed91f8a
docs: Update the distributed services and get started docs (#4521)
Sherlock113 Feb 22, 2024
7b0b0e6
refactor(cli): make CLI commands available as modules (#4487)
frostming Feb 23, 2024
69b8a29
docs: Refactor BentoCloud docs (#4525)
Sherlock113 Feb 23, 2024
b7169c7
Added add_api functionalty with example | draft
holzweber Jan 30, 2024
cc86bc5
working but unclean version of dynamic services
holzweber Feb 10, 2024
6fc21a1
run pre-commit hook on changed files
holzweber Feb 10, 2024
cc20f20
Missing checks
holzweber Feb 10, 2024
7f8a01f
Merge branch 'dynamic-service-creation' of https://github.com/holzweb…
holzweber Feb 25, 2024
0ed2ab5
added dynamic service
holzweber Mar 17, 2024
05fa5a1
ci: auto fixes from pre-commit.ci
pre-commit-ci[bot] Mar 17, 2024
3bc355d
added services with type()
holzweber Mar 18, 2024
c2fbcfc
fix merge issues
holzweber Mar 18, 2024
9a42ea0
ci: auto fixes from pre-commit.ci
pre-commit-ci[bot] Mar 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Expand Up @@ -41,6 +41,8 @@ clean: ## Clean all generated files
@find . -type f -name '*.py[co]' -delete -o -type d -name __pycache__ -delete

# Docs
watch-docs: ## Build and watch documentation
pdm run sphinx-autobuild docs/source docs/build/html --watch $(GIT_ROOT)/src/ --ignore "bazel-*"
spellcheck-docs: ## Spell check documentation
pdm run sphinx-build -b spelling ./docs/source ./docs/build || (echo "Error running spellchecker.. You may need to run 'make install-spellchecker-deps'"; exit 1)
OS := $(shell uname)
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 0 additions & 59 deletions docs/source/bentocloud/best-practices/cost-optimization.rst

This file was deleted.

21 changes: 0 additions & 21 deletions docs/source/bentocloud/best-practices/index.rst

This file was deleted.

103 changes: 83 additions & 20 deletions docs/source/bentocloud/get-started.rst
Expand Up @@ -12,38 +12,99 @@ Specifically, BentoCloud features:
- Flexible APIs for continuous integration and deployments (CI/CD).
- Built-in observability tools for monitoring model performance and troubleshooting.

Plans
-----
Access BentoCloud
-----------------

BentoCloud is available with the following two plans.
To gain access to BentoCloud, visit the `BentoML website <https://www.bentoml.com/>`_ to sign up.

Starter
^^^^^^^
Once you have your BentoCloud account, do the following to get started:

The Starter plan is designed for small teams of developers who want to focus on building AI applications without infrastructure management. With the autoscaling feature of BentoCloud, you only pay for the resources you use.
1. Install BentoML by running ``pip install bentoml``. See :doc:`/get-started/installation` for details.
2. Create an :doc:`API token with Developer Operations Access </bentocloud/how-tos/manage-access-token>`.
3. Log in to BentoCloud with the ``bentoml cloud login`` command, which will be displayed on the BentoCloud console after you create the API token.

Enterprise
^^^^^^^^^^
Deploy your first model
-----------------------

The Enterprise plan includes all the features offered in the Starter plan. It is tailored for teams that want to use BentoCloud in :doc:`their own cloud or on-premises environment (BYOC) </bentocloud/how-tos/byoc>`, ensuring data security and compliance. If you prefer not to use your own cluster, we can provide a dedicated cloud environment for you. Either way, we take care of managing the infrastructure to ensure a scalable and secure model deployment experience.
Perform the following steps to quickly deploy an example application on BentoCloud. It is a summarization service powered by a Transformer model `sshleifer/distilbart-cnn-12-6 <https://huggingface.co/sshleifer/distilbart-cnn-12-6>`_.

Access BentoCloud
-----------------
1. Install the dependencies.

To gain access to BentoCloud, sign up here:
.. code-block:: bash

.. raw:: html
pip install bentoml torch transformers

<a href="https://kdyvd8c5ifq.typeform.com/to/eTujPAaE" class="custom-button demo">Schedule a Demo</a>
<a href="https://cloud.bentoml.com" class="custom-button trial">Start Free Trial</a>
2. Create a BentoML Service in a ``service.py`` file as below. The pre-trained model is pulled from Hugging Face.

Once you have your BentoCloud account, do the following to get started:
.. code-block:: python

1. Install BentoML by running ``pip install bentoml``. See :doc:`/get-started/installation` for details.
2. Create an :doc:`API token with Developer Operations Access </bentocloud/how-tos/manage-access-token>`.
3. Log in to BentoCloud with the ``bentoml cloud login`` command, which will be displayed on the BentoCloud console after you create the API token.
from __future__ import annotations
import bentoml
from transformers import pipeline


EXAMPLE_INPUT = "Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century."


@bentoml.service(
resources={"cpu": "2"},
traffic={"timeout": 10},
)
class Summarization:
def __init__(self) -> None:
self.pipeline = pipeline('summarization')

@bentoml.api
def summarize(self, text: str = EXAMPLE_INPUT) -> str:
result = self.pipeline(text)
return result[0]['summary_text']

.. note::

You can test this Service locally by running ``bentoml serve service:Summarization``. For details of the Service, see :doc:`/get-started/quickstart`.

3. Create a ``bentofile.yaml`` file as below.

.. code-block:: yaml

service: 'service:Summarization'
labels:
owner: bentoml-team
project: gallery
include:
- '*.py'
python:
packages:
- torch
- transformers

4. Deploy the application to BentoCloud. The deployment status is displayed both in your terminal and the BentoCloud console.

.. code-block:: bash

bentoml deploy .

5. On the BentoCloud console, navigate to the **Deployments** page, and click your Deployment. On its details page, you can see the sample input and summarize it with the application once it is up and running.

.. image:: ../_static/img/bentocloud/get-started/bentocloud-playground-quickstart.png

Interact with it using the Form, Python client, or CURL command on the **Playground** tab. Here is an example of creating a Python client to interact with it. Replace the endpoint URL with your own.

.. code-block:: python

import bentoml

client = bentoml.SyncHTTPClient("https://summarization-example--aws-ca-1.mt1.bentoml.ai")
result: str = client.summarize(
text="Breaking News: In an astonishing turn of events, the small town of Willow Creek has been taken by storm as local resident Jerry Thompson's cat, Whiskers, performed what witnesses are calling a 'miraculous and gravity-defying leap.' Eyewitnesses report that Whiskers, an otherwise unremarkable tabby cat, jumped a record-breaking 20 feet into the air to catch a fly. The event, which took place in Thompson's backyard, is now being investigated by scientists for potential breaches in the laws of physics. Local authorities are considering a town festival to celebrate what is being hailed as 'The Leap of the Century.",
)
print(result)

6. To terminate this Deployment, click **Stop** in the top right corner of its details page or simply run:

.. code-block:: bash

Now, you can try an `example project and deploy it to BentoCloud <https://github.com/bentoml/quickstart>`_.
bentoml deployment terminate summarization

Resources
---------
Expand All @@ -52,3 +113,5 @@ If you are a first-time user of BentoCloud, we recommend you read the following

- Deploy :doc:`example projects </use-cases/index>` to BentoCloud
- :doc:`/bentocloud/how-tos/manage-deployments`
- :doc:`/bentocloud/how-tos/create-deployments`
- :doc:`/bentocloud/how-tos/manage-access-token`
61 changes: 61 additions & 0 deletions docs/source/bentocloud/how-tos/autoscaling.rst
@@ -0,0 +1,61 @@
===========
Autoscaling
===========

The autoscaling feature of BentoCloud dynamically adjusts the number of Service replicas within the specified minimum and maximum limits. This document explains how to set autoscaling for Deployments.

You can define the minimum and maximum values to define the boundaries for scaling, allowing the autoscaler to reduce or increase the number of replicas as needed. This feature supports scaling to zero replica. You can also define the specific metric thresholds that the autoscaler will use to determine when to adjust the number of replicas. The available ``metrics`` values include:

- ``cpu``: The CPU utilization percentage.
- ``memory``: The memory utilization.
- ``gpu``: The GPU utilization percentage.
- ``qps``: The queries per second.

By setting values for these fields, you are instructing the autoscaler to ensure that the average for each metric does not exceed the specified thresholds. For example, if you set the CPU value to ``80``, the autoscaler will target an average CPU utilization of 80%.

Allowed scaling-up behaviors (``scale_up_behavior``):

- ``fast`` (default): There is no stabilization window, so the autoscaler can increase the number of Pods immediately if necessary. It can increase the number of Pods by 100% or by 4 Pods, whichever is higher, every 15 seconds.
- ``stable``: The autoscaler can increase the number of Pods, but it will stabilize the number of Pods for 300 seconds (5 minutes) before deciding to scale up further. It can increase the number of Pods by 100% every 15 seconds.
- ``disabled``: Scaling-up is turned off.

Allowed scaling-down behaviors (``scale_down_behavior``):

- ``fast``: There is no stabilization window, so the autoscaler can reduce the number of Pods immediately if necessary. It can decrease the number of Pods by 100% or by 4 Pods, whichever is higher, every 15 seconds.
- ``stable`` (default): The autoscaler can reduce the number of Pods, but it will stabilize the number of Pods for 300 seconds (5 minutes) before deciding to scale down further. It can decrease the number of Pods by 100% every 15 seconds.
- ``disabled``: Scaling-down is turned off.

To set autoscaling, you need to configure the above fields in a separate YAML or JSON file. For example:

.. code-block:: yaml
:caption: `config-file.yaml`

services:
MyBentoService: # The Service name
scaling:
max_replicas: 2
min_replicas: 1
policy:
metrics:
- type: "cpu | memory | gpu | qps" # Specify the type here
value: "string" # Specify the value here
scale_down_behavior: "disabled | stable | fast" # Choose the behavior
scale_up_behavior: "disabled | stable | fast" # Choose the behavior

You can then deploy your project by referencing this file.

.. tab-set::

.. tab-item:: BentoML CLI

.. code-block:: bash

bentoml deploy . -f config-file.yaml

.. tab-item:: Python API

.. code-block:: python

import bentoml
# Set `bento` to the Bento name if it already exists
bentoml.deployment.create(bento = "./path_to_your_project", config_file="config-file.yaml")
6 changes: 3 additions & 3 deletions docs/source/bentocloud/how-tos/byoc.rst
@@ -1,6 +1,6 @@
====
BYOC
====
====================
Bring your own cloud
====================

BentoCloud provides Bring Your Own Cloud (BYOC) as a part of the Enterprise plan, which allows you to run BentoCloud services within your
private cloud environment. This means the BentoCloud Control Plane and the Data Plane are separated, enabling you to stay closer to your data
Expand Down