Skip to content

Benchmarking Tutorial

Botond Dénes edited this page Oct 11, 2018 · 1 revision

Benchmarking Tutorial

Sooner or later you will find that you need to benchmark an optimization you have developed for Scylla and that the micro and macro benchmarks found in perf are not adequate for measuring the benefits, you need a real cluster with loaders.

This document attempts to be a complete guide for explaining how one should properly benchmark Scylla.

How to deploy the code and where?

Benchmarking should not be done on a noisy desktop computer, running hundreds of other processes besides Scylla. It is also not advised to run the loader and Scylla on the same machine, as they can influence each other and compete for resources. The best is to run Scylla and the loader on different machines, each machine used only for that one respective process.

The amount of nodes that should make up the tested cluster is dependent on what you are attempting to measure. If it has no distributed aspects, you might be better off with just a single node, but usually it does and you should use at least 3 nodes.

Since not many people has 3+ homogeneous server-grade machines lying around, benchmarking is usually done in the cloud, either on GCE or AWS.

TODO: machine type recommendations

There are several choices when it comes to deploy Scylla on the chosen machines.

Building packages

This involves building distribution packages for the Linux distro installed on the chosen machines. Scylla supports building packages for the Debian and Redhat family of Linux distros. Not that the built packages don't support any distro using one of the supported packages formats, each Scylla release has a list of supported distributions, check the release notes for the version you are trying to benchmark to find out these. Building packages happens via invoking dist/redhat/build_rpm.sh or dist/debian/build_deb.sh respectively.

Note that if you are benchmarking a released version, you can just install the respective version via the standard packages. Building a custom package is only needed when you are either testing an unreleased version, or a modified tree.

Just installing the Scylla packages is not enough, you need to follow the setup procedure to get a working Scylla install.

The generated packages will not contain the dependencies needed for Scylla. To obtain those the best way is to install the standard packages then overwrite just the package containing the Scylla executable. So for example if you want to test a modified 2.1 version, you can install the standard Scylla 2.1 and then overwrite just the executable by installing the package called scylla-server that you built. If you want to test a master version, you can install Scylla nightly and then install you own executable on top.

All currently supported releases, as well as the nigthly packages are available on Scylla's download page.

This method works quite well but it has a significant drawback. Building the packages involves doing a full rebuild of the entire source tree, which can take a long time. If you plan to do more than a few modify-benchmark cycles this method might not be the one for you. Note that this was true at the time of writing this tutorial, there are ongoing effort to reduce the time it takes to build the packages.

Compiling locally

Another option is to compile Scylla locally, on the machines it's going to run on. This involves cloning the Scylla repository on each of the machines, then compiling it locally, finally replacing the Scylla executable with the freshly compiled one. Note that you still need a working Scylla installation on the respective machines as the base for this. Like in the packages version, if you are testing a modified release, you should set up that release, if you are testing master you should set up Scylla-nigthly. To be able to compile Scylla, you will furthermore need to install build-time dependencies. This can easily be done with Scylla's install-dependencies.sh scripts. Note that this script doesn't support all otherwise supported distros equally well. For the best results use CentOS7.

This option works well if you plan to do many modify-benchmark cycles, as updates can be simply pushed to the machines via git and Scylla can be quickly (well depending on what you modified) recompiled.

If you plan to use this method often, you might want to have a look a this bag-of-scripts, which I specifically developed for automating parts of this method.

SCT

ScyllaDB's QA department developed a feature rich tool for automating the deployment and testing of Scylla clusters. This tool is SCT and it can also be used to deploy Scylla for benchmarking.

TODO: tutorial on how to setup a cluster with SCT.

Other methods

Generally speaking, there are almost as many methods invented for this, as many people have attempted benchmarking Scylla. Some use custom scripts, some use Ansible, others use CCM. The possibilities are only limited by imagination. In the above chapters we only listed the most prominent methods, used my more than one people.

Monitoring

Benchmarking without monitoring is like driving blind. It is strongly recommended to setup monitoring, preferably on an entirely separate machine. Contrary to the test cluster and the loaders it is entirely OK to setup monitoring on your desktop machine.

For instructions see the project's home page.

What to use for measuring?

Available tools

There are several tool available when it comes to benchmarking Scylla, we are going to briefly review the most prominent ones below.

cassandra-stress

Cassandra stress (also dubbed as c-s) is the most widely used tool for benchmarking Cassandra and Scylla. Documentation is available on Datastax's documentation site and also scattered around several blog posts around the internet. It is the most feature complete and mature tool for benchmarking Scylla and especially for Cassandra and its usage is generally preferred over other tools.

TODO: tutorial for cassandra-stress. Either inline, or preferably in a separate document and link to it. Something like a collection of links to blog posts would be of great help as well as c-s documentation is quite sparse.

scylla-bench

Scylla bench was born out of frustration with cassandra-stress's inefficiency with handling large partitions and it's lack of support for counters at the time. The initial version was put together by Pawel Dziepak, but recently the project was moved under ScyllaDB's umbrella and now can be found here. It is not as feature complete as cassandra-stress but generally it is nicer to use and is more efficient. Since it was moved under ScyllaDB's umbrella development has picked up pace to close the feature gap. It now also has some features that cassandra-stress lacks, like the support for benchmarking range scans.

For more details see the scylla-bench README which documents the usage quite well.

YCSB

This is Yahoo's tool for measuring the performance of databases and it is compatible with Cassandra and Scylla. See the project's home page.

How to measure?

One can benchmark Scylla for several reasons:

  • Find out how it fares in real-life.
  • Find any regressions.
  • Measure the improvements brought by an optimization or feature.

In the first two cases, the challenge is to find a workload that imitates real-life workloads as closely as possible.

In the latter case the challenge is to emphasize the impact of the optimization or feature as much as possible and at the same time reduce noise. You need to think of what the effect of your optimization or feature is and what workload would be best to measure them as directly as possible. For example: if you want to measure the performance of reads you want to make sure that no writes are happening and there are no compactions either. If you want to measure the performance of disk I/O you might want to disable the cache too so all reads go to the disk.

There are some general advices that apply, no matter what you measure. Make sure your loader machines are loaded at or below 50%, otherwise you might end up measuring your client's inefficiency. Usually benchmarking compares the performance of two versions. You want to make sure that you run the same workload in both cases and that the environment is as much the same as possible, otherwise you might end up measuring the difference in the workloads or the environments. It is also important to measure for a period that is long enough to avoid temporary events skewing the results.

Latency

Latency is best measured with a fixed request rate, that preferably loads the cluster below 50% of its capacity. Both cassandra-stress and scylla-bench support a fixed request rate. At the end of the measurement you should compare the client-perceived latency numbers reported by the loaders. It's not just the mean that you should be looking at. Users like consistent latency numbers, so the tail latency (95+th percentiles) is just as interesting, in some cases even more interesting, as the mean. You should also look at the max latency, and if it is an extreme outlier, think about what could have caused it.

Throughput

When measuring throughput, you want to push the cluster to the limits of what it can deal with, then compare what this limit is. A good way to reach this is to incrementally add more loaders until the throughput doesn't increase anymore. You should be able to observe that the cluster reached some kind of bottleneck. This bottleneck should be a hardware one (CPU, I/O, network). If such a bottleneck cannot be observed it is very likely that there is something wrong with how you are loading the cluster. This can happen for a number of reasons, some of the most common ones are:

  • Not enough client concurrency.
  • Inefficient (overloaded) clients.
  • Not enough data (especially if you are testing range scans).
  • Bad load distribution: hot partition, hot shard, etc.
  • And finally: a bug in the implementation of the feature/optimization under test.

Observing that the CPU is bottlenecked is as easy as making sure that the reactor load metric is 100% (or very close to it) for all nodes.

Observing that the disk or the network is bottlenecked is a lot trickier as they can be bottlenecked by both OPS or bandwidth. Generally the same method can be used as for loading the cluster to the limit. Increase load until you see no further increase. It doesn't hurt to know the capabilities of the hardware too. If you know the disk is capable of 10GBs but it "peaks" at 3GBs than something is likely wrong.

When comparing throughput numbers you usually want to look at the OPS numbers, the number of requests completed per second. If you run a time-limited workload then you should compare the total number of requests served. If you executed a fixed number of requests it might also be useful to compare the time it took to execute them. These can be extracted from the metrics but they will usually also be reported by the clients. Depending on what exactly you are testing you might want to look at other metrics too. Just as interesting as the metrics themselves is whether the bottleneck has moved and if it did, why.

See also

Clone this wiki locally