Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cherrypick KAFKA-15780: Wait for consistent KRaft metadata when creating or deleting topics #14713

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

jolshan
Copy link
Member

@jolshan jolshan commented Nov 8, 2023

TestUtils.createTopicWithAdmin calls waitForAllPartitionsMetadata which waits for partition(s) to be present in each brokers' metadata cache. This is a sufficient check in ZK mode because the controller sends an LISR request before sending an UpdateMetadataRequest which means that the partition in the ReplicaManager will be updated before the metadata cache.

In KRaft mode, the metadata cache is updated first, so the check may return before partitions and other metadata listeners are fully initialized.

Testing:
Insert a Thread.sleep(100) in BrokerMetadataPublisher.onMetadataUpdate after

  // Publish the new metadata image to the metadata cache.
  metadataCache.setImage(newImage)

and run EdgeCaseRequestTest.testProduceRequestWithNullClientId and the test will fail locally nearly deterministically. After the change(s), the test no longer fails.

Reviewers: Justine Olshan jolshan@confluent.io

Conflicts:
core/src/test/scala/unit/kafka/server/GroupCoordinatorBaseRequestTest.scala
core/src/test/scala/integration/kafka/admin/TopicCommandIntegrationTest.scala

I removed the file that didn't exist and added the controllerServer arguments to TopicCommandIntegrationTest. Will rerun.

…ting topics (apache#14695)

TestUtils.createTopicWithAdmin calls waitForAllPartitionsMetadata which waits for partition(s) to be present in each brokers' metadata cache. This is a sufficient check in ZK mode because the controller sends an LISR request before sending an UpdateMetadataRequest which means that the partition in the ReplicaManager will be updated before the metadata cache.

In KRaft mode, the metadata cache is updated first, so the check may return before partitions and other metadata listeners are fully initialized.

Testing:
Insert a Thread.sleep(100) in BrokerMetadataPublisher.onMetadataUpdate after

      // Publish the new metadata image to the metadata cache.
      metadataCache.setImage(newImage)
and run EdgeCaseRequestTest.testProduceRequestWithNullClientId and the test will fail locally nearly deterministically. After the change(s), the test no longer fails.

Reviewers: Justine Olshan <jolshan@confluent.io>

Conflicts:
	core/src/test/scala/unit/kafka/server/GroupCoordinatorBaseRequestTest.scala
	core/src/test/scala/integration/kafka/admin/TopicCommandIntegrationTest.scala
@jolshan
Copy link
Member Author

jolshan commented Nov 8, 2023

@divijvaidya This one was missing the change in some test files, so we should probably review TopicCommandIntegrationTest. I suspect 3.5 will have more conflicts.

@jolshan jolshan changed the title KAFKA-15780: Wait for consistent KRaft metadata when creating or deleting topics cherrypick KAFKA-15780: Wait for consistent KRaft metadata when creating or deleting topics Nov 8, 2023
@splett2
Copy link
Contributor

splett2 commented Nov 8, 2023

updates to TopicCommandIntegrationTest lgtm

@jolshan jolshan requested a review from divijvaidya November 10, 2023 01:02
@jolshan jolshan merged commit a1d1834 into apache:3.6 Nov 13, 2023
mjsax pushed a commit to confluentinc/kafka that referenced this pull request Nov 22, 2023
…ting topics (apache#14695) (apache#14713)

TestUtils.createTopicWithAdmin calls waitForAllPartitionsMetadata which waits for partition(s) to be present in each brokers' metadata cache. This is a sufficient check in ZK mode because the controller sends an LISR request before sending an UpdateMetadataRequest which means that the partition in the ReplicaManager will be updated before the metadata cache.

In KRaft mode, the metadata cache is updated first, so the check may return before partitions and other metadata listeners are fully initialized.

Testing:
Insert a Thread.sleep(100) in BrokerMetadataPublisher.onMetadataUpdate after

      // Publish the new metadata image to the metadata cache.
      metadataCache.setImage(newImage)
and run EdgeCaseRequestTest.testProduceRequestWithNullClientId and the test will fail locally nearly deterministically. After the change(s), the test no longer fails.

Conflicts:
	core/src/test/scala/unit/kafka/server/GroupCoordinatorBaseRequestTest.scala
	core/src/test/scala/integration/kafka/admin/TopicCommandIntegrationTest.scala

Reviewers: Justine Olshan <jolshan@confluent.io>, Divij Vaidya <diviv@amazon.com>, David Mao <dmao@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants