what is difference between partition and replica of a topic in kafka cluster

hadoop apache-kafka

When you add the message to the topic, you call send(KeyedMessage message) method of the producer API. This means that your message contains key and value. When you create a topic, you specify the number of partitions you want it to have. When you call "send" method for this topic, the data would be sent to only ONE specific partition based on the hash value of your key (by default). Each partition may have a replica, which means that both partitions and its replicas store the same data. The limitation is that both your producer and consumer work only with the main replica and its copies are used only for redundancy.

Refer to the documentation: http://kafka.apache.org/documentation.html#producerapiAnd a basic training: http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign

hadoop apache-kafka

Topics are partitioned across multiple nodes so a topic can grow beyond the limits of a node. Partitions are replicated for fault tolerance. Replication and leader takeover is one of the biggest difference between Kafka and other brokers/Flume. From the Apache Kafka site:

Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

hadoop apache-kafka

partition: each topic can be splitted up into partitions for load balancing (you could write into different partitions at the same time) & scalability (the topic can scale up without the instance limitations); within the same partition the records are ordered;
replica: for fault-tolerant durability mainly;

Quotes:

The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.

There is a quite intuitive tutorial to explain some fundamental concepts in Kafka: https://www.tutorialspoint.com/apache_kafka/apache_kafka_fundamentals.htm

Furthermore, there is a workflow to get you through the confusing jumgle: https://www.tutorialspoint.com/apache_kafka/apache_kafka_workflow.htm

CodeHunter

what is difference between partition and replica of a topic in kafka cluster

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last