Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16583: fix PartitionRegistration#toRecord directory check under metadata version 3_7_IV2 #15751

Closed
wants to merge 1 commit into from

Conversation

superhx
Copy link

@superhx superhx commented Apr 18, 2024

fix https://issues.apache.org/jira/browse/KAFKA-16583

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

… metadata version 3_7_IV2

Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
Copy link
Contributor

@funky-eyes funky-eyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@showuon showuon self-assigned this May 24, 2024
@showuon
Copy link
Contributor

showuon commented May 24, 2024

Thanks for fixing this issue. I'll take a look this week or next week.

Copy link
Contributor

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@superhx , thanks for the fix! Could you please add some tests for this change?

@showuon
Copy link
Contributor

showuon commented May 27, 2024

@soarez , could you have another look at this PR since you're the expert in KRaft JBOD?
It seems after partition reassignment, the directory is set as UNASSIGNED, and that causes the broker startup failure. Is the change the correct fix to this issue?

Copy link
Member

@soarez soarez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the bug and filing a PR.

I don't think this approach is correct. Before the metadata supports directory assignment only Directory.MIGRATING should be used which is also the default in the metadata records.

The issue JIRA states:

When reassigning partition, PartitionRegistration#merge will set the new replicas with UNASSIGNED directory

Before IBP_3_7_IV2 this shouldn't happen. I think that's the root cause of the issue here.

@superhx
Copy link
Author

superhx commented May 27, 2024

PartitionRegistration

Pass the metadataVersion to PartitionRegistration#merge may be a more appropriate fix. But I found that PartitionRegistration#merge is used in many places and it is not very convenient to obtain metadataVersion.

@soarez
Copy link
Member

soarez commented May 28, 2024

Can you please confirm:

  1. In the JIRA you describe the cluster as formed of 1 Controller and 2 Brokers (BrokerA and BrokerB), is my understanding correct that all of these were running 3.4.0 and that BrokerB was the first and only server to be upgraded to 3.7.0?
  2. Did you configure brokers with inter.broker.protocol.version at any point or let it default?

This is the stack trace in the JIRA:

[2024-04-18 14:46:54,144] ERROR Encountered metadata loading fault: Unhandled error initializing new publishers (org.apache.kafka.server.fault.LoggingFaultHandler)
org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been lost because the following could not be represented in metadata version 3.4-IV0: the directory assignment state of one or more replicas
	at org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
	at org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
	at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
	at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
	at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
	at org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
	at org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
	at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
	at java.base/java.lang.Thread.run(Thread.java:840)

It shows that:

  • The failure happens after the broker first catches up with cluster metadata, in MetadataLoader#initializeNewPublishers, converting the metadata image into a delta used to initialize publishers.
  • UnwritableMetadataException is thrown from PartitionRegistration#toRecord because PartitionRegistration#directories contain some directory ID that’s not MIGRATING and MetadataVersion (MV) is still below IBP_3_7_IV2. This operation converts PartitionRegistration – the in-memory representation of partition metadata – into PartitionRecord – the serializable metadata format.

Neither controllers nor brokers pre 3.7 produce records with directory IDs, the only possible value in previous records is MIGRATING. So UNASSIGNED was set while loading metadata. i.e. PartitionRegistration must have produced UNASSIGNED during either:

  • class initialization – we can rule out PartitionRegistration.Builder#build() as that’s only used in the controller, and if I understood the issue description correctly, your controller was still running 3.4.0. â?Œ
  • loading PartitionRecord – directory IDs are taken as-is in the PartitionRegistration(PartitionRecord record) constructor and default to MIGRATING. The records wouldn’t previously have any IDs, so we rule this out too. â?Œ
  • loading PartitionChangeRecord – in merge(PartitionChangeRecord record) the loaded records wouldn’t have any directory IDs, but we call DirectoryId.createDirectoriesFrom which produces UNASSIGNED. As far as I can tell, this is the likely culprit. ✅

PartitionRegistration loads records without dir IDs and doesn't check the MV before using UNASSIGNED.
There are at least two approaches to fix this:

  1. Refer to the current MV as the record is interpreted so we can choose the correct default. Referring to the current MV should be required while interpreting potentially older records that omit fields. But as @superhx pointed out, it’s a bit tricky to to that from #merge.
  2. Do not produce UNASSIGNED and default to MIGRATING regardless of MV. The onus of setting UNASSIGNED falls to the creation of PartitionChangeRecord. These records are produced by the Controller, via PartitionChangeBuilder which selects the directory using ClusterControlManager#defaultDir, which will pick UNASSIGNED as necessary. So this this seems fine.

From these two approaches, I prefer the second one, it’s not only a simpler change but also removes redundant logic.

@superhx
Copy link
Author

superhx commented May 29, 2024

  1. In the JIRA you describe the cluster as formed of 1 Controller and 2 Brokers (BrokerA and BrokerB), is my understanding correct that all of these were running 3.4.0 and that BrokerB was the first and only server to be upgraded to 3.7.0?

yes

  1. Did you configure brokers with inter.broker.protocol.version at any point or let it default?

default value

@superhx
Copy link
Author

superhx commented May 29, 2024

Do not produce UNASSIGNED and default to MIGRATING regardless of MV. The onus of setting UNASSIGNED falls to the creation of PartitionChangeRecord. These records are produced by the Controller, via PartitionChangeBuilder which selects the directory using ClusterControlManager#defaultDir, which will pick UNASSIGNED as necessary. So this this seems fine.

The complexity of this repair seems to have exceeded my abilities. Perhaps it would be more suitable for you to fix it. @soarez

@soarez
Copy link
Member

soarez commented May 29, 2024

I've filed PR #16118

@superhx superhx closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants