[KAFKA-13073] Simulation test fails due to inconsistency in MockLog's implementation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: controller, replication
Labels:
- kip-500

Description

We are getting the following error on trunk

RaftEventSimulationTest > canRecoverAfterAllNodesKilled STANDARD_OUT
    timestamp = 2021-07-12T16:26:55.663, RaftEventSimulationTest:canRecoverAfterAllNodesKilled =
      java.lang.RuntimeException:
        Uncaught exception during poll of node 1                                  |-------------------jqwik-------------------
    tries = 25                    | # of calls to property
    checks = 25                   | # of not rejected calls
    generation = RANDOMIZED       | parameters are randomly generated
    after-failure = PREVIOUS_SEED | use the previous seed
    when-fixed-seed = ALLOW       | fixing the random seed is allowed
    edge-cases#mode = MIXIN       | edge cases are mixed in
    edge-cases#total = 108        | # of all combined edge cases
    edge-cases#tried = 4          | # of edge cases tried in current run
    seed = 8079861963960994566    | random seed to reproduce generated values    Sample
    ------
      arg0: 4002
      arg1: 2
      arg2: 4

I think there are a couple of issues here:

The ListenerContext for KafkaRaftClient uses the value returned by ReplicatedLog::startOffset() to determined the log start and when to load a snapshot while the MockLog implementation uses logStartOffset which could be a different value.
MockLog doesn't implement ReplicatedLog::maybeClean so the log start offset is always 0.
The snapshot id validation for MockLog and KafkaMetadataLog's createNewSnapshot throws an exception when the snapshot id is less than the log start offset.

Solutions:

Fix the error quoted above we only need to fix bullet point 3. but I think we should fix all of the issues enumerated in this Jira.

For 1. we should change the MockLog implementation so that it uses startOffset both externally and internally.

For 2. I will file another issue to track this implementation.

For 3. I think this validation is too strict. I think it is safe to simply ignore any attempt by the state machine to create an snapshot with an id less that the log start offset. We should return a {{Optional.empty()}}when the snapshot id is less than the log start offset. This tells the user that it doesn't need to generate a snapshot for that offset.

Attachments

Issue Links

relates to

KAFKA-13074 Implement mayClean for MockLog

Open

links to

GitHub Pull Request #11032

Activity

People

Assignee:: Jose Armando Garcia Sancio

Reporter:: Jose Armando Garcia Sancio

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Jul/21 03:30

Updated:: 14/Jul/21 21:55

Resolved:: 14/Jul/21 21:55