Nhập mã khuyến mãi ONETCOMVN được giảm 10%

Tài Khoản

Apache Kafka using Keys for Partition

29/12/2020

đang xem

Chưa phân loại

Apache Kafka is a data streaming platform responsible for streaming data from a number of sources to a lot of targets. The sources are also called producers. The data produced is needed by a completely different group called consumers for various purposes. Kafka is the layer that sits between the producers and consumers and aggregates the data into a usable pipeline. Also Kafka itself is a distributed platform, so the Kafka layer is composed of various servers running a kafka, these servers or nodes are hence known as Kafka Brokers.

That overview is a bit in the abstract so let’s ground it in a real-world scenario, imagine you need to monitor several web servers. Each running its own website, and new logs are constantly being generated in each one of them every second of the day. On top of that there are a number of email servers that you need to monitor as well.

You may need to store that data for record keeping and billing purposes, which is a batch job that doesn’t require immediate attention. You might want to run analytics on the data to make decisions in real-time which requires accurate and immediate input of data. Suddenly you find yourself in the need for streamlining the data in a sensible way for all the various needs. Kafka acts as that layer of abstraction to which multiple sources can publish different streams of data and a given consumer can subscribe to the streams it finds relevant. Kafka will make sure that the data is well-ordered. It is the internals of Kafka that we need to understand before we get to the topic of Partitioning and Keys.

Kafka Topics, Broker and Partitions

Kafka Topics are like tables of a database. Each topic consists of data from a particular source of a particular type. For example, your cluster’s health can be a topic consisting of CPU and memory utilization information. Similarly, incoming traffic to across the cluster can be another topic.

Kafka is designed to be horizontally scalable. That is to say, a single instance of Kafka consists of multiple Kafka brokers running across multiple nodes, each can handle streams of data parallel to the other. Even if a few of the nodes fail your data pipeline can continue to function. A particular topic can then be split into a number of partitions. This partitioning is one of the crucial factors behind the horizontal scalability of Kafka.

Multiple producers, data sources for a given topic, can write to that topic simultaneously because each writes to a different partition, at any given point. Now, usually data is assigned to a partition randomly, unless we provide it with a key.

Partitioning and Ordering

Just to recap, producers are writing data to a given topic. That topic is actually split into multiple partitions. And each partition lives independently of the others, even for a given topic. This can lead to a lot of confusion when the ordering to data matters. Maybe you need your data in a chronological order but having multiple partitions for your datastream doesn’t guarantee perfect ordering.

You can use only a single partition per topic, but that defeats the whole purpose of Kafka’s distributed architecture. So we need some other solution.

Keys for Partitions

Data from a producer are sent to partitions randomly, as we mentioned before. Messages being the actual chunks of data. What producers can do besides just sending messages is to add a key that goes along with it.

All the messages that come with the specific key will go to the same partition. So, for example, a user’s activity can be tracked chronologically if that user’s data is tagged with a key and so it always end up in one partition. Let’s call this partition p0 and the user u0.

Partition p0 will always pick up the u0 related messages because that key tie them together. But that doesn’t mean that p0 is only tied up with that. It can also take up messages from u1 and u2 if it has the capacity to do so. Similarly, other partitions can consume data from other users.

The point that a given user’s data isn’t spread across different partition ensuring chronological ordering for that user. However, the overall topic of user data, can still leverage the distributed architecture of Apache Kafka.

Conclusion

While distributed systems like Kafka solve some older problems like lack of scalability or having single a point of failure. They come with a set of problems that are unique to their own design. Anticipating these problems is an essential job of any system architect. Not only that, sometimes you really have to do a cost-benefit analysis to determine whether the new problems are a worthy trade-off for getting rid of the older ones. Ordering and synchronization are just the tip of the iceberg.

Hopefully, articles like these and the official documentation can help you along the way.

ONET IDC

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.

Chia sẻ

Bài Viết Mới

Hướng dẫn fake ip bằng phần mềm SStap

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

BitBrowser – Best Anti-Detect Browser!

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Dịch Vụ Thiết Lập Hệ Thống Tường Lửa (Firewall)

Dịch Vụ Triển Khai Hệ Thống Ảo Hóa & Cloud

Dịch Vụ Triển Khai Hệ Thống Ceph

Dịch Vụ Triển Khai Hệ Thống BGP Multi-Peer Cho ISP

Hướng Dẫn Chọn Dịch Vụ Thuê Địa Chỉ IPv4

Bài Viết

Bài Viết Mới Cập Nhật

Hướng dẫn fake ip bằng phần mềm SStap

Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau...

10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”...

02/06/2025

BitBrowser – Best Anti-Detect Browser!

Good anti association effect, complete browser fingerprint modification, affordable price! Please recommend it to friends around you! BitBrowser – anti detect browser, Dorang Account Defense Association ⚙️ Function: – RPA automation – API script – Extended plug -in – Window synchronization – Support Global Proxy IP Used for: Capital monetization, crypto，E-commerce, Social Media Marketing, Shopping Price Comparison, Price Comparison, Advertising, Alliance Marketing, Agency Operation, Self-testing etc. ♾️10 Profiles for Free ♾️ Free registration link：https://www.bitbrowser.net/vi/?code=5df4f4ec WhatsApp service group : https://chat.whatsapp.com/FCQaHfHbR351GIje98OIA9 Technical service group : https://t.me/bitbrowser000

26/05/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông...

04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các...

04/04/2025

Bài Viết Mới

Hướng dẫn fake ip bằng phần mềm SStap

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

BitBrowser – Best Anti-Detect Browser!

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Dịch Vụ Thiết Lập Hệ Thống Tường Lửa (Firewall)

Dịch Vụ Triển Khai Hệ Thống Ảo Hóa & Cloud

Dịch Vụ Triển Khai Hệ Thống Ceph

Dịch Vụ Triển Khai Hệ Thống BGP Multi-Peer Cho ISP

Hướng Dẫn Chọn Dịch Vụ Thuê Địa Chỉ IPv4

Hotline/Zalo

09.016.19.525

Nhận chương trình khuyến mãi từ ONET IDC

72 Lê Thánh Tôn, P.Bến Nghé, Quận 1, TP HCM

1001 S MAIN ST STE 600 KALISPELL, MT 59901

Điện thoại: 09.016.19.525

Email liên hệ:

[email protected]

Apache Kafka using Keys for Partition

Kafka Topics, Broker and Partitions

Partitioning and Ordering

Keys for Partitions

Conclusion

Bài Viết Mới

Install ZFS File System on Ubuntu 18.04 LTS

How to remotely control Linux Systems from Windows

How to install Lutris – An Open Gaming Platform for Linux

Bài Viết Mới Cập Nhật

Hướng dẫn fake ip bằng phần mềm SStap

Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau...

10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”...

02/06/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông...

04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các...

04/04/2025

Bài Viết Mới

CHÍNH SÁCH & ĐIỀU KHOẢN

Apache Kafka using Keys for Partition

Kafka Topics, Broker and Partitions

Partitioning and Ordering

Keys for Partitions

Conclusion

Bài Viết Mới

Install ZFS File System on Ubuntu 18.04 LTS

How to remotely control Linux Systems from Windows

How to install Lutris – An Open Gaming Platform for Linux

Bài Viết Mới Cập Nhật

Hướng dẫn fake ip bằng phần mềm SStap Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau... 10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”... 02/06/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP) Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông... 04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các... 04/04/2025

Bài Viết Mới

CHÍNH SÁCH & ĐIỀU KHOẢN

Hướng dẫn fake ip bằng phần mềm SStap

Hướng dẫn Tải và cài đặt Các bạn vào Google gõ từ khóa “Download SStap” hoặc vào sẵn link https://sourceforge.net/projects/sstap/files/latest/download Sau...

10/06/2025

VPS treo game là gì? Thuê VPS treo game giá rẻ, không lo giật lag

Bạn đam mê những tựa game online và muốn cày cuốc không ngừng nghỉ, nhưng chiếc máy tính cá nhân lại không đủ “trâu”...

02/06/2025

Dịch Vụ Xây Dựng Hệ Thống Peering Với Internet Exchange (IXP)

Peering với Internet Exchange (IXP) là giải pháp quan trọng giúp tăng tốc độ kết nối, giảm độ trễ, tối ưu chi phí băng thông...

04/04/2025

Dịch Vụ Triển Khai VPN Site-to-Site & Remote Access

Giới Thiệu VPN (Virtual Private Network) là giải pháp quan trọng giúp bảo mật dữ liệu, đảm bảo kết nối an toàn giữa các...

04/04/2025