Benchmarking Kafka: Getting started with OpenMessaging Benchmark

January 03, 2025
Jorge Esteban Quilcate Otoya | how-to
#apache-kafka #performance

Estimating the performance impact of new use-cases or features like Tiered Storage on Kafka clusters can be challenging. In this post, let’s explore OpenMessaging Benchmark, a macro-benchmarking framework for measuring messaging system performance, and how to use it on Apache Kafka deployments.

OpenMessaging Benchmark (OMB) is an industry-standard framework that emerged from the collaborative efforts of messaging platform providers to create consistent, reproducible performance measurements; and it has become a trusted tool used by major vendors[1] to validate their platform performance claims.

OMB’s architecture consists of three main components:

and it can run in two modes: distributed and local.

Distributed mode:

Local mode:

To start running benchmarks we will need:

Let’s start by running a simple workload using the local mode to get familiar with OMB’s capabilities.

Demo

Install OMB binaries

Before getting started, let’s quickly describe how to obtain the binaries:

Building from source is the default way to get the binaries:

git clone git@github.com:openmessaging/benchmark
cd benchmark
mvn package

This compiles all the modules and packages a tar file under ./package/target. You can either unpack binaries to a different location, or start using the binaries directly as the modules are now compiled—or download a pre-compiled version from my fork

Place your terminal in your chosen location to run the binaries.

Running first benchmark

Remember the 3 things we need to run a benchmark: A running cluster, benchmark workers, a driver, and a workload specification.

Start a Kafka cluster

Let’s start a local Kafka cluster if you don’t have one already. Recently, this is as easy as:

docker run -p 9092:9092 apache/kafka:3.9.0

Driver

The Driver specification defines key configuration properties for topics, clients, producers, and consumers. Here’s a basic configuration:

name: kafka-local
driverClass: io.openmessaging.benchmark.driver.kafka.KafkaBenchmarkDriver

topicConfig: ""
replicationFactor: 1

# Kafka client-specific configuration
commonConfig: |
  bootstrap.servers=localhost:9092
producerConfig: ""
consumerConfig: |
  auto.offset.reset=earliest

auto.offset.reset=earliest should be the default. Without it benchmark checks do not pass as Consumer is expecting new messages after the producer has sent the test message :)

All *Config properties are strings. Use multi-line to add multiple configurations.

With these defined, let’s move to the Workload spec.

Workload

Workload defines how producers and consumers are scheduled, how many topics and partitions, how load is distributed, and the execution mode.

In this post let’s focus on the simplest one: define fixed-throughput workload:

name: fixed-1MiB-1t10p-1P1x1C

# Duration
warmupDurationMinutes: 1
testDurationMinutes: 5

# Topic topology
topics: 1
partitionsPerTopic: 10

# Throughput
producerRate: 1024
messageSize: 1024
payloadFile: "payload/payload-1Kb.data"

# Producers
producersPerTopic: 1

# Consumers
subscriptionsPerTopic: 1
consumerPerSubscription: 1

Run the benchmark

bin/benchmark --drivers [driver-path] [workload-path]

It starts the benchmark with a warm-up phase of 1 minute, and the starts the actual benchmark for 5 minutes:

INFO WorkloadGenerator - ----- Starting warm-up traffic (1m) ------
INFO WorkloadGenerator - Pub rate  1018.3 msg/s /  1.0 MB/s | Pub err     0.0 err/s | Cons rate  1018.3 msg/s /  1.0 MB/s | Backlog:  0.0 K | Pub Latency (ms) avg:  1.4 - 50%:  0.8 - 99%: 21.4 - 99.9%: 29.4 - Max: 30.9 | Pub Delay Latency (us) avg: 109.6 - 50%: 74.0 - 99%: 609.0 - 99.9%: 6171.0 - Max: 15553.0
INFO WorkloadGenerator - Pub rate  1018.8 msg/s /  1.0 MB/s | Pub err     0.0 err/s | Cons rate  1018.7 msg/s /  1.0 MB/s | Backlog:  0.0 K | Pub Latency (ms) avg:  1.2 - 50%:  1.2 - 99%:  1.8 - 99.9%:  3.4 - Max:  5.8 | Pub Delay Latency (us) avg: 99.3 - 50%: 95.0 - 99%: 194.0 - 99.9%: 671.0 - Max: 1864.0

Every 5 seconds reports windowed metrics:

INFO WorkloadGenerator - Pub rate  1023.8 msg/s /  1.0 MB/s | Pub err     0.0 err/s | Cons rate  1023.7 msg/s /  1.0 MB/s | Backlog:  0.0 K | Pub Latency (ms) avg:  1.2 - 50%:  1.2 - 99%:  2.2 - 99.9%: 26.1 - Max: 32.8 | Pub Delay Latency (us) avg: 99.5 - 50%: 94.0 - 99%: 248.0 - 99.9%: 780.0 - Max: 1963.0
INFO WorkloadGenerator - ----- Aggregated Pub Latency (ms) avg:  1.3 - 50%:  1.2 - 95%:  1.7 - 99%:  4.9 - 99.9%: 28.3 - 99.99%: 39.3 - Max: 40.3 | Pub Delay (us)  avg: 102.8 - 50%: 93.0 - 95%: 152.0 - 99%: 273.0 - 99.9%: 1574.0 - 99.99%: 9971.0 - Max: 15553.0
INFO WorkloadGenerator - ----- Starting benchmark traffic (5m)------
INFO WorkloadGenerator - Pub rate  1055.7 msg/s /  1.0 MB/s | Pub err     0.0 err/s | Cons rate  1055.7 msg/s /  1.0 MB/s | Backlog:  0.0 K | Pub Latency (ms) avg:  1.5 - 50%:  1.4 - 99%:  6.5 - 99.9%: 33.0 - Max: 37.1 | Pub Delay Latency (us) avg: 112.1 - 50%: 99.0 - 99%: 331.0 - 99.9%: 1908.0 - Max: 7472.0
INFO WorkloadGenerator - Pub rate  1027.5 msg/s /  1.0 MB/s | Pub err     0.0 err/s | Cons rate  1027.6 msg/s /  1.0 MB/s | Backlog:  0.0 K | Pub Latency (ms) avg:  1.3 - 50%:  1.3 - 99%:  1.9 - 99.9%: 27.3 - Max: 34.2 | Pub Delay Latency (us) avg: 102.0 - 50%: 96.0 - 99%: 209.0 - 99.9%: 700.0 - Max: 5550.0

and at the end it writes the worker metrics to a result JSON file

INFO Benchmark - Writing test result into simple-workload-kafka-local-2024-12-21-10-26-21.json

Reviewing results

Results are collected from workers (same process in this case) and written down to a JSON file. This JSON file contains all the samples took during the exection. To plot them, use the following command:

bin/create_charts.py [results-path]

This will take all the results and generate SVG graphs to analyze the results, including:

Results: E2E latency quantiles

These results, along with server-side metrics, should give a good idea on how the cluster performed during the benchmark execution.

Summary

If you’re looking to quickly reproduce a specific workload on a Kafka cluster, this guide offers a starting point without requiring custom producer/consumer development. While benchmarking tools are typically associated with large-scale production environments, they can be valuable for simple one-off tests and experimentation.

In upcoming posts, I’ll explore other aspects of my Kafka performance testing journey including deep dives into OMB execution modes, specific Kafka feature testing, and broker profiling techniques.

References