Data pipeline using Kafka and Elasticsearch Logstash Kibana

Data Pipeline Architecture
Prerequisite
Making the Environment Ready
Kafka Producer
Create the Logstash Pipeline
Access data in Kibana
- Happy Coding

The current world is heavily dependent on data. Everyone is generating large amounts of data.
It is becoming a challenge to read large amounts of data and then process it and finally perform some action over that data. In this post, we will be creating a data pipeline using Kafka, Elasticsearch, Logstash, and Kibana which will serve three responsibilities

Read data from the Kafka topic
Process it using Logstash
Dump the data to elastic search and then visualize the data using Kibana

Data Pipeline Architecture

There are five important components in this pipeline

Kafka Server: This is the point where data is published first.
Producer: The producer plays the role of publishing data to Kafka topic. In the real world, you can have any entity that produces data to Kafka topic. In our example, we will be generating fake user registration data.
Elasticsearch: This will act as a database that will store the user registration data in itself. But the main question that arises is which entity will push data to Elasticsearch.
Logstash: Logstash will play the role of middleman where we will read data from the Kafka topic and then it will insert that data into Elasticsearch.
Kibana: Kibana will play the role of a Graphical User Interface which will present the data in a readable or graphical format

Prerequisite

we will be using Ubuntu 18.04 to create this data pipeline, but you can use your own development environment also. Install Ubuntu Desktop 18.04
Python3.x must be installed
Docker must be installed Install Docker on Ubuntu 18.04
Docker Compose must be installed Install Docker Compose on Ubuntu 18.04

Making the Environment Ready

Kafka & Zookeeper & Kafka Manager

As we need Kafka where the Publisher can publish the registered user data, so we need to install Kafka.
We will be installing Kafka using docker-compose. Install Kafka-Zookeeper-KafkaManager Using Docker Composer
Once you start the services present in the docker-compose file then the most important points are
Kafka is listening on PORT 9093
Zookeeper is listening on PORT 2181
Kafka Manager is listening on PORT 9000

Elasticsearch && Kibana

we will be installing Elasticsearch and Kibana using docker-compose. Install Elasticsearch and Kibana using Docker Compose
After the installation, Elasticsearch will be running at PORT 9092
After the installation, Kibana will be running at PORT 5601

Logstash

We are using Ubuntu 18.04, so you can set up Logstash using this post. Install Logstash on Ubuntu 18.04
The configuration folder for Logstash will be located at

cd /etc/logstash/conf.d/

Kafka Producer

What we will be doing with Kafka Producer

We will be creating a Kafka producer that will produce registered user data and publish that to the Kafka topic
We will be using the Python Faker module to generate fake registered user data

Install the dependencies of Kafka Producer

pip install Faker

pip install kafka-python

Writing the Producer Producer in Python

The final producer will be something like this

from faker import Faker
from kafka import KafkaProducer
import json
from data import get_registered_user
import time

fake = Faker()

def get_registered_user():
return {
"name": fake.name(),
"address": fake.address(),
"created_at": fake.year()
}

def json_serializer(data):
return json.dumps(data).encode("utf-8")

producer = KafkaProducer(bootstrap_servers=['192.168.0.10:9093'],
value_serializer=json_serializer)

if __name__ == "__main__":
while 1 == 1:
registered_user = get_registered_user()
print(registered_user)
producer.send("registered_user", registered_user)
time.sleep(4)

The data is being pushed to the registered_user topic
Kafka broker is running at 9093 port

Learn the details about this producer in this post

Create the Logstash Pipeline

we will read data from the Kafka topic registered_user
Then we will push the data to the Elasticsearch index named registered_user
Navigate to the configuration folder of logstash

cd /etc/logstash/conf.d/

create a new file

touch kafka-elastic-pipeline.conf

put the below contents in the conf file

input {
kafka {
bootstrap_servers => "192.168.0.10:9093"
topics => ["registered_user"]
}
}

output {
elasticsearch {
hosts => ["192.168.0.10:9200"]
index => "registered_user"
workers => 1
}
}

The pipeline will create an input section that will try to read the data from the Kafka broker
The Kafka server address is given by the bootstrap_servers config
we will be reading the data from registered_user topic

input {
kafka {
bootstrap_servers => "192.168.0.10:9093"
topics => ["registered_user"]
}
}

The pipeline will create an output section that will push data to the elastic search server
Elasticsearch is running at 192.16.0.10:9200
we are pushing the data to registered_user index of the elastic search

Save the file and your pipeline will be active

You can check Logstash logs at /var/log/logstash folder

Access data in Kibana

Open Kibana running at http://192.168.0.10:5601
Goto Management Menu

Goto Index Management and search for the registered_user index
Then create the index
Finally Goto the Discover section
you can see the registered user logs

Happy Coding

Post Views: 2,022

Data pipeline using Kafka and Elasticsearch Logstash Kibana

Table of Contents

Data Pipeline Architecture

Prerequisite

Making the Environment Ready

Kafka & Zookeeper & Kafka Manager

Elasticsearch && Kibana

Logstash

Kafka Producer

What we will be doing with Kafka Producer

Install the dependencies of Kafka Producer

Writing the Producer Producer in Python

Create the Logstash Pipeline

Access data in Kibana

Happy Coding

Related Post

Leave a Reply Cancel reply

You Missed

Remove Background for Free

Install Poetry on your machine

Install Poetry on Mac

Get Disciplined in Your Life with One Simple Rule

Explore Selftuts

Data pipeline using Kafka and Elasticsearch Logstash Kibana

Table of Contents

Data Pipeline Architecture

Prerequisite

Making the Environment Ready

Kafka & Zookeeper & Kafka Manager

Elasticsearch && Kibana

Logstash

Kafka Producer

What we will be doing with Kafka Producer

Install the dependencies of Kafka Producer

Writing the Producer Producer in Python

Create the Logstash Pipeline

Access data in Kibana

Happy Coding

Related Post

Install Elasticsearch and Kibana using docker-compose

Install Kafka and Kafka Manager using docker-compose

Leave a Reply Cancel reply

You Missed

Remove Background for Free

Install Poetry on your machine

Install Poetry on Mac

Get Disciplined in Your Life with One Simple Rule