While working on Kafka internals, our team (thanks Ivan!) found an important design decision made by the Kafka community regarding the TCP server processing pipeline: At the socket level, Kafka enforces a mechanism that processes only one request per connection at a time. This means that a socket connection is muted after picking a request from the socket until the response is served. Then it is unmuted and the next request is picked from the socket. This applies to all APIs.
This design decision has significant implications for Kafka’s performance and scalability, as it allows for efficient use of system resources and reduces the overhead of context switching while limiting clients request processing. For Kafka’s traditional processing model, where data is mostly written and read from memory (i.e., page cache) this is usually not a problem; though when a cluster is stretched over multiple datacenters, or flushing data to disk is required, this design decision may lead to performance bottlenecks where tail latency is magnified by the constraint of processing produce requests one at a time.
On the client-side, there is the possibility of using the max.in.flight.requests.per.connection configuration to allow for multiple requests to be in-flight at the same time. Even though the client can send multiple requests to the broker, this does not mean that the broker will process them in parallel. The broker will buffer the requests in the socket, and process them one by one.
This is an important finding if you are looking into performance optimizations for Kafka, and it is worth considering when designing and tuning your Kafka-based applications. It makes us wonder if this design could be upgraded to allow for multiple requests —at least at the produce API requests— to be processed in parallel when the broker benefits from it; or if the dependency on the socket level muting mechanism is too strong across the codebase to allow for this change. It also makes me want to look deeper into the socket server implementation. I may write about it later.