Improve sendsync performance by omnibs · Pull Request #151 · NoRedInk/haskell-libraries

omnibs · 2026-05-07T17:00:09Z

Context

Our synchronous writes to Kafka have been slow since forever, and we used to think that was on Kafka itself: high throughput but high latency.

Turns out it was a simple mistake on our side.

Synchronous sends are synchronous as much as Erlang/Elixir's synchronous messaging is synchronous: it's a wrapper around an async send that synchronously waits on an async ack.

When a kafka broker acks a message, librdkafka's I/O thread enqueues a delivery-report event onto an internal queue. That event sits there until some thread calls rd_kafka_poll, which drains the queue and dispatches callbacks. Those callbacks free up pending sendSync calls.

The issue

In our codebase, pollEvents is what triggers rd_kafka_poll, and we used to have it like this:

pollEvents producer = do
  Producer.produceMessageBatch producer []
    |> map (\_ -> ())
  Control.Concurrent.threadDelay 100_000 {- 100ms -}
  pollEvents producer

That 100ms delay means sendSync might take up to 100ms to be waken up after an ack comes back from the broker.

The fix

We could have simply reduced the delay, but there's another function called flushProducer, which does something similar to what we did, but in a better way.

flushProducer hangs on a call to rd_kafka_poll for up to 100ms. If no events come, it times out, and you can call it again. If an event comes, it return immediately.

The 100ms timeout might seem superfluous, but ceding control back to Haskell allows us to respond to async exceptions and terminate our producers cleanly.

hw-kafka's own example producer uses this function.

Benchmarks

We created a simple benchmark script here to test sync sends with and without the fix.

$ KAFKA_BROKER_ADDRESSES=localhost:9092 BENCHMARK_MESSAGE_COUNT=200 BENCHMARK_WARMUP=20 cabal run sync-write-benchmark -fsync-write-benchmark

Without the fix

Benchmarking sync writes: warmup=20 count=200 topic=nri-kafka-sync-benchmark
Warming up...
Measuring 200 sends...
count=200  min=100.1ms  avg=101.1ms  p50=101.1ms  p95=101.4ms  p99=102.1ms  max=102.1ms

With the fix

Benchmarking sync writes: warmup=20 count=200 topic=nri-kafka-sync-benchmark
Warming up...
Measuring 200 sends...
count=200  min=5.6ms  avg=6.3ms  p50=6.2ms  p95=6.9ms  p99=8.6ms  max=8.8ms

This opens the possibility to adopt synchronous Kafka writes more broadly.

New issue

We uncovered a new issue while digging into this: we're not handling delivery failures in sendSync.

We just wrote a known-issues.md documenting future work.

$ KAFKA_BROKER_ADDRESSES=localhost:9092 BENCHMARK_MESSAGE_COUNT=200 BENCHMARK_WARMUP=20 cabal run sync-write-benchmark -fsync-write-benchmark # Without the fix Benchmarking sync writes: warmup=20 count=200 topic=nri-kafka-sync-benchmark Warming up... Measuring 200 sends... count=200 min=100.1ms avg=101.1ms p50=101.1ms p95=101.4ms p99=102.1ms max=102.1ms # With the fix Benchmarking sync writes: warmup=20 count=200 topic=nri-kafka-sync-benchmark Warming up... Measuring 200 sends... count=200 min=5.6ms avg=6.3ms p50=6.2ms p95=6.9ms p99=8.6ms max=8.8ms

Copilot

Pull request overview

This PR reduces Kafka.sendSync latency by changing the producer event-polling loop to drain librdkafka delivery-report events promptly, and adds a small benchmark executable to measure per-message sync-write latency.

Changes:

Replaces the pollEvents loop’s fixed 100ms threadDelay + empty-batch trick with Producer.flushProducer to wake sendSync waiters as soon as delivery reports arrive.
Adds a sync-write-benchmark executable (behind a Cabal flag) to benchmark sequential sendSync latency.
Updates Cabal metadata (package.yaml + generated .cabal) to include the new flag/executable.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File	Description
`nri-kafka/src/Kafka.hs`	Switches the background event loop to use `flushProducer` and documents the rationale/shutdown behavior.
`nri-kafka/scripts/sync-write-benchmark/Main.hs`	New benchmark program for measuring `sendSync` per-message latency.
`nri-kafka/package.yaml`	Adds a flag-gated `sync-write-benchmark` executable definition.
`nri-kafka/nri-kafka.cabal`	Regenerated Cabal file reflecting the new flag/executable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+formatStats :: [Data.Word.Word64] -> String
+formatStats samples =
+  let sorted = Data.List.sort samples
+      n = Prelude.length sorted
+      total = Prelude.sum sorted
+      avg = nsToMs (total `Prelude.div` fromIntegral n)


we're running out of space

omnibs added 3 commits May 7, 2026 13:37

create a benchmark app for slow sendSync

678539b

document known issue not fixed here

6a152de

Copilot AI review requested due to automatic review settings May 7, 2026 17:00

Copilot started reviewing on behalf of omnibs May 7, 2026 17:00 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

omnibs added 2 commits May 7, 2026 21:12

try a more aggressive disk cleaner action

1bccbab

we're running out of space

bump macos while we're at it

e187e7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sendsync performance#151

Improve sendsync performance#151
omnibs wants to merge 5 commits into
trunkfrom
improve-sendsync-performance

omnibs commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

omnibs commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

The issue

The fix

Benchmarks

Without the fix

With the fix

New issue

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

omnibs commented May 7, 2026 •

edited

Loading