MirrorMaker 2.0 (MM2) - Amazon MSK Migration Guide

MirrorMaker 2.0 (MM2)

Note

This whitepaper is only meant to be used as a reference for deploying MirrorMaker 2.0 (MM2) on Kafka Connect to migrate data between multiple Amazon MSK clusters. AWS does not offer any support for it.

MirrorMaker 2.0 (MM2) is a multi-cluster data replication engine based on the Kafka Connect framework. MM2 is a combination of an Apache Kafka source connector and a sink connector. You can use a single MM2 cluster to migrate data between multiple clusters. MM2 automatically detects new topics and partitions, while also ensuring the topic configurations are synced between clusters.

MM2 is available as part of the Apache Kafka 2.4.0 release and supersedes all capabilities of MM1. AWS recommends using MM2.

MM2 remote topics

Replication policies help consumers perform migrations between self-managed Apache Kafka clusters and Amazon MSK clusters with no downtime. MM2 comes with a default replication policy that can be customized by providing a custom replication policy class.

Remote topics in MM2 are replicated topics on the target cluster. These topics reference the source cluster using a naming convention as shown in the following figure.

Diagram showing Remote topics

Remote topics

For the default replication policy, topicA is the source topic. Source is a source cluster alias used in the configuration properties file and the remote topic name is source.topicA. This distinguishes between a replicated remote topic and a topic created on the destination or target cluster.

MM2 ensures ordering in partitions when it replicates the topic from source cluster to the remote topic in the destination cluster. The following diagram shows an example of remote topics created on source and destination cluster when bi-directional replication is configured in MM2.

MM2 internal topics

MM2 uses the following internal topics for replication purposes:

Heartbeat topic: Emitted from the source cluster and replicated to demonstrate connectivity through connectors. This can be used by downstream consumers to verify that the connector is running and the corresponding source cluster is available. Messages in this topic contain information on the source cluster, target cluster, and timestamp when the heartbeat was created.

Checkpoint topic: Emitted in the target cluster by the connector and contains consumer offsets for each consumer group in the source cluster. The connector will periodically query the source cluster for all committed offsets of consumer groups (except for replicated topics) and emit a message to this topic. Information in this message includes the consumer group id, topic, partition, upstream offset, downstream offset, metadata, and timestamp. Consumers use the checkpoint topic via MirrorClient or RemoteClusterUtils class to get the replicated offsets in the target Apache Kafka cluster.

Offset sync topics: Encodes cluster-to-cluster offset mapping for each replicated topic-partition. Messages in this topic contain topic, partition, upstream offset, and downstream offset.

MM2 Connectors

MM2 has the following connectors for enabling complex flows between multiple Apache Kafka clusters and across data centers via existing Kafka Connect clusters.

  • MirrorSourceConnector replicates a set of topics from a single source cluster into the primary cluster.

  • MirrorCheckpointConnector emits consumer offset checkpoints and syncs the offset with __consumer_offsets (as of Apache Kafka 2.7.0).

  • MirrorHeartbeatConnector emits heartbeats.

MM2 Deployment Methods

MirrorMaker 2.0 can be deployed in several modes:

Deploying a dedicated MM2 cluster

The connect-mirror-maker.sh script sets up the MirrorSourceConnector, MirrorCheckpointConnector, and MirrorHeartbeatConnector connectors based on the provided MM2 properties file. See the following example file.

#Apache Kafka clusters clusters = #comma separated list of Apache Kafka cluster aliases source.bootstrap.servers = apache-kafka-source:9092 target.bootstrap.servers = apache-kafka-target:9092 source -> target.enabled = true #Source and target cluster configuration source.config.storage.replication.factor = 1 target.config.storage.replication.factor = 1 source.offset.storage.replication.factor = 1 target.offset.storage.replication.factor = 1 #Mirror Maker configuration offset-sync.topic.replication.factor = 1 heartbeat.topic.replication.factor = 1 checkpoint.topic.replication.factor = 1 topics = .* groups = .* tasks.max = 1 replication.factor = 1 refresh.topics.enabled = true sync.topic.configs.enabled = true #Enable heartbeats and checkpoints source->target.emit.heartbeats.enabled = true source->target.emit.checkpoints.enabled = true

Ensure MM2 has successfully connected to the source and target clusters using the kafka-topics.sh script to list and compare the topics created on both clusters. See the following example.

#Source Topics ./kafka-topics.sh --bootstrap-server <source bootstrap string> --list __consumer_offsets heartbeats mm2-configs.target.internal mm2-offset-syncs.target.internal mm2-offsets.target.internal mm2-status.target.internal #Target Topics ./kafka-topics.sh --bootstrap-server <target bootstrap string> --list __consumer_offsets heartbeats mm2-configs.source.internal mm2-offsets.source.internal mm2-status.source.internal source.checkpoints.internal source.heartbeats

Deploying MM2 on a Kafka Connect cluster

Customers that already have an existing Kafka Connect cluster running on EC2 instances or containers can run MM2 on the same cluster by starting the MirrorSourceConnector, MirrorCheckpointConnector, and MirrorHeartbeatConnector connectors.

The AWS provided self-service Amazon MSK workshop details all of the necessary steps required to set up MM2 on a Kafka Connect cluster. See the following figure for an overview.

Diagram showing MM2 on Kafka Connect

MM2 on Kafka Connect

In legacy mode

After legacy MirrorMaker is deprecated, the existing ./bin/kafka-mirror-maker.sh scripts will be updated to run MM2 in legacy mode:

$ ./bin/kafka-mirror-maker.sh --consumer consumer.properties --producer producer.properties