Description
NodeToControllerChannelManager caches the activeController address in an AtomicReference which is updated when:
- activeController has not been set
- networkClient disconnnects from the controller
- A node replies with `Errors.NOT_CONTROLLER`, and
- When a controller changes from Zk mode to Kraft mode
When running multiple Kafka clusters in a dynamic environment, there is a chance that a controller's IP may get reassigned to another cluster's broker when the controller is bounced. In this scenario, the requests from Node to the Controller may fail with an AuthenticationException and are then retried indefinitely. This causes the node to get stuck as the new controller's information is never set.
A potential fix would be disconnect the network client and invoke `updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case.
Attachments
Issue Links
- links to