Explain Dive deep into Kafka's architecture.

Kafka - Interview Questions

Kafka remedies the two different models by publishing records to different topics. Each topic has a partitioned log, which is a structured commit log that keeps track of all records in order and appends new ones in real time. These partitions are distributed and replicated across multiple servers, allowing for high scalability, fault-tolerance, and parallelism.

Each consumer is assigned a partition in the topic, which allows for multi-subscribers while maintaining the order of the data. By combining these messaging models, Kafka offers the benefits of both. Kafka also acts as a very scalable and fault-tolerant storage system by writing and replicating all data to disk. By default, Kafka keeps data stored on disk until it runs out of space, but the user can also set a retention limit. Kafka has four APIs :

Producer API : used to publish a stream of records to a Kafka topic.

Consumer API : used to subscribe to topics and process their streams of records.

Streams API : enables applications to behave as stream processors, which take in an input stream from topic(s) and transform it to an output stream which goes into different output topic(s).

Connector API : allows users to seamlessly automate the addition of another application or data system to their current Kafka topics.