![]() ![]() It is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It is a distributed event store and stream-processing platform. If you are running on Kubernetes, you can easily get started with our Karrot open source helm charts (lucky you we also wrote the first Burrow open source helm chart).Ĭreate a values.yaml file (reduced to its minimal content) to provide your Kafka and Zookeeper clusters you want to monitor and deploy it on Minikube: # Content of values.yaml burrow: enabled: true burrow: config: cluster: mycluster: servers: - :9092 consumer: mycluster: servers: - :9092 notifier: karrot: className: http templateClose: events.tmpl templateOpen: events.Kafka is an open-source system developed by the Apache Software Foundation written in Java and Scala. # Content of /etc/burrow/templates/events.tmpl Įvery time Burrow detects an update on a consumer group, events are being passed down to Karrot which will happily ingest the data and distribute it to multiple destinations (CloudWatch, Prometheus). We are thrilled to announce the release of Karrot, a Kafka lag reporter processing events from Burrow ! One built-in feature retained our attention, the ability to forward the consumer lag information to an HTTP client. Configurable HTTP client for sending alerts to another system for all groupsĪs listed above, Burrow is doing all the heavy lifting around consumers lag monitoring exposing this data over a REST API for multiple clusters.Configurable emailer for sending alerts for specific groups.HTTP endpoint for consumer group status, as well as broker and consumer information.Configurable support for Storm-committed offsets. ![]() Configurable support for Zookeeper-committed offsets.Automatically monitors all consumers using Kafka-committed offsets.NO THRESHOLDS! Groups are evaluated over a sliding window.While looking for a better lag reporting solution years ago we came across Burrow from LinkedIn which is a monitoring companion for Kafka with multiple interesting features: Whenever possible, we try to not rebuild the wheel and leverage existing solutions especially to reduce the overhead of maintaining in-house solutions which are not business specific. GumGum is a heavy consumer of open source projects. Rebuild or not rebuild a Consumer lag reporting service - That is the question ? Consumer lag must be reported to Prometheus so that engineers can access a single monitoring UI (Grafana) to inspect application performance.Consumer lag must be reported to CloudWatch in order to trigger ECS autoscaling.It was time for GumGum to improve it’s rusty lag reporting solution in order to fit today’s needs: It was more than obvious to us that Kafka lag should be made available to Prometheus / Grafana using a custom exporter. In a previous blog post, we talked about how we rolled Prometheus as a new monitoring solution for our infrastructure to help us collect real time metrics about our systems. Running such applications at scale requires solid monitoring especially when Kafka is at the heart of the infrastructure. This means that the computed lag for a consumer must be posted to the CloudWatch API in order to be able to adjust the size of an autoscaling group or an ECS service based on this custom metric.Īutoscaling based on Consumer Lag - Consumer lag (orange) / Application instance count (green) In a Cloud provider like AWS, most of the auto scaling actions get triggered by a CloudWatch alarm. In a microservice world, whether you run on VMs, ECS or Kubernetes, you may want to adjust the number of running instances / tasks / pods based on the consumer lag, thus making Kafka lag reporting a critical piece of your infrastructure. Rebalance/Rebalancing: the procedure that is followed by a number of distributed processes that use Kafka clients and/or the Kafka coordinator to form a common group and distribute a set of resources among the members of the group (source : Incremental Cooperative Rebalancing: Support and Policies ). Is the consumer group in a stable state (consumer rebalancing) ?.In other words, how far behind is your consumer compared to the latest produced message in the topic the consumer is reading from. Kafka Consumer Lag is an indicator of how much lag there is between Kafka producers and consumers. Are they able to keep up with the incoming traffic (consumer lag) ?.Kafka cluster operations are a thing on their own (scaling clusters in and out, recovering from a dead broker, reassigning partitions across the cluster…) but if you want to build performant client applications based on Kafka you want to pay close attention to your consumers: GumGum was an early adopter of this technology and is nowadays running hundreds of brokers across multiple clusters. Kafka is a really powerful distributed publish / subscribe software that helps you build complex asynchronous applications. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |