Mark As Completed Discussion

This is a summary of the Kafka white paper, which is available at this link.

1. Abstract

  • Kafka, a distributed messaging system that is developed for collecting and delivering high volumes of log data with low latency.
  • Kafka has superior performance when compared to two popular messaging systems.

2. Introduction

  • A large amount of log data is generated at any sizable internet company which includes user activity, operational metrics.

    1. User activity
      • It contains events corresponding to logins, page views, clicks, comments, and search queries.
    2. Operational metrics
      • It contains service call stack, call latency, errors, and system metrics
    3. system metrics include CPU, memory, network, or disk utilization on each machine.
  • Activity data

  1. search relevance
  2. recommendations
  3. ad targeting and reporting
  4. security
  • Kafka provides an API similar to a messaging system and allows applications to consume log events in real-time.