Skip to content

Load ramping

Tomás Senart edited this page Jan 12, 2020 · 1 revision

To quote Brendan Gregg's excellent Systems Performance book, chapter 12.3.7 Ramping Load:

"This is a simple method for determining the maximum throughput a system can handle. In involves adding load in small increments and measuring the delivered throughput until a limit is reached. The results can be graphed, showing a scalability profile. This profile can be studied visually[...]

[...] When following this approach, measure latency as well as the throughput, especially the latency distribution. [...] If you push load too high, latency may become so high that it is no longer reasonable to consider the result as valid. Ask yourself if the delivered latency would be acceptable to a customer."

A script that uses vegeta to achieve this is available in the scripts/load-ramping/ directory. It will automatically run vegeta against a target with different request rates and graph the latency distribution and success rate at each request rate.

Example output

python3 -m http.server: Connections vs latency and success rate (python3 -m http.server) LWAN: Connections vs latency and success rate (LWAN)

Note that all axes are logarithmic. It is easy to see that LWAN performs significantly better than Python's included web server. In both cases, the latency distribution degrades way earlier than the success rate. Interestingly, at first, as the request rate goes up, the latency actually goes down.

These benchmarks were taken on a Thinkpad T480s. They are meant as examples only, not as authoritative comparison of these web servers.

Usage

echo GET http://localhost:8080/ | python3 ramp-requests.py

Dependencies

  • vegeta
  • Python 3
  • Gnuplot
Clone this wiki locally