Skip to content

Commit

Permalink
docs: update monitoring feature (#737)
Browse files Browse the repository at this point in the history
* fix: remove imports

* fix: remove commented codes

* fix: update server start gif

* docs: fix typo

* docs: add monitor under server section

* fix: add missing gif

* fix: minor fix

* fix: minor revision
  • Loading branch information
numb3r3 committed Jun 1, 2022
1 parent bb8c4ce commit 5e06667
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 7 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 44 additions & 2 deletions docs/user-guides/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ CLIP-as-service is designed in a client-server architecture. A server is a long-
- Vertical scaling: using PyTorch JIT, ONNX or TensorRT runtime to speedup single GPU inference.
- Supporting gRPC, HTTP, Websocket protocols with their TLS counterparts, w/o compressions.

This chapter introduces the API of the client.
This chapter introduces the API of the server.

```{tip}
You will need to install client first in Python 3.7+: `pip install clip-server`.
You will need to install server first in Python 3.7+: `pip install clip-server`.
```

## Start server
Expand Down Expand Up @@ -380,6 +380,48 @@ In pratice, we found it is unnecessary to run `clip_server` on multiple GPUs for
Based on these two points, it makes more sense to have multiple replicas on a single GPU comparing to have multiple replicas on different GPU, which is kind of waste of resources. `clip_server` scales pretty well by interleaving the GPU time with mulitple replicas.
```

## Monitoring with Prometheus

To monitor the performance of the service, you can enable the monitoring feature in the Flow YAML:

```{code-block} yaml
---
emphasize-lines: 5,6,14,15
---
jtype: Flow
version: '1'
with:
port: 51000
monitoring: True
port_monitoring: 9090
executors:
- name: clip_t
uses:
jtype: CLIPEncoder
metas:
py_modules:
- executors/clip_torch.py
monitoring: true
port_monitoring: 9091
```

Then, you will get

```{figure} images/server-start-monitoring.gif
:width: 80%
```

As shown in the above example, this Flow will create two metrics exposing endpoints:
- `http://localhost:9090` for the gateway
- `http://localhost:9091` for the encoder

```{tip}
To visualize your metrics through a dashboard, we recommend Grafana https://grafana.com/.
```

Click [here](https://docs.jina.ai/fundamentals/flow/monitoring-flow/) for more information on monitoring in a Flow.

## Serving in HTTPS/gRPCs

Expand Down
3 changes: 0 additions & 3 deletions server/clip_server/executors/clip_onnx.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import os
import warnings
from functools import partial
from multiprocessing.pool import ThreadPool
from typing import Optional, Dict

Expand Down Expand Up @@ -47,8 +46,6 @@ def __init__(
# prefer CUDA Execution Provider over CPU Execution Provider
if self._device.startswith('cuda'):
providers.insert(0, 'CUDAExecutionProvider')
# TODO: support tensorrt
# providers.insert(0, 'TensorrtExecutionProvider')

sess_options = ort.SessionOptions()

Expand Down
1 change: 0 additions & 1 deletion server/clip_server/executors/clip_tensorrt.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
from functools import partial
from multiprocessing.pool import ThreadPool
from typing import Dict

Expand Down
1 change: 0 additions & 1 deletion server/clip_server/executors/clip_torch.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import os
import warnings
from functools import partial
from multiprocessing.pool import ThreadPool
from typing import Optional, Dict

Expand Down

0 comments on commit 5e06667

Please sign in to comment.