Sample Project
Here are two unit tests that show the issue at hand:
https://gist.github.com/chamons/f78672c4bb659aeb1e8499a6925e5f3b
Replace broker-1 with your local pulsar cluster or use this docker compose.
Details
After a topic load moves to another broker, due to:
- Pulsar broker restarts
- Load shedding due to high CPU
- Calling
admin/v2/persistent/public/default/{topic}/unload or the pulsar-admin command
Sending notifications via a MultiTopicProducer to that topic will now fail with:
Producer(Connection(Io(Custom { kind: TimedOut, error: " connection c0010f8b-9ae8-4b31-abba-91d83e283695 timedout sending message to the Pulsar server" }))
However, if you close the consumer and then try again, it works just fine.
This is rather non-obvious, however it makes the Pulsar::send simplified API unusable in these cases, as there is now way to manually close it. This means once you write to a topic using it, if it load sheds then that entire connection's Pulsar::send can not write to it again.
We work around this in our code base by only using MultiTopicProducer and closing it after a subset of errors.