AI chat applications are everywhere now. From ChatGPT to coding assistants and support bots, conversational AI has become one of the most exciting areas for developers. But here’s the good part you do not need a massive ML setup or a paid API to build your own AI chat application. In this tutorial, you will learn how to build your own ChatGPT clone in just 30 minutes using Python, Flask, HTML, CSS, JavaScript, and Ollama. By the end of this guide, you will have a working AI chat app with

  • A clean chat interface
  • Conversation-style message bubbles
  • Local AI model integration
  • A real backend API
  • A ChatGPT-style experience running on your own machine

This is a beginner-friendly project and a great starting point if you want to build your own AI products.

Prerequisite

Architecture

Project Structure

Bash
chatgpt-clone/

├── app.py
├── requirements.txt
├── static/
   └── style.css
└── templates/
    └── index.html

This structure keeps the project clean and easy to understand.

Create the virtual Environment

Goto to the folder where you want to create the project

Bash
mkdir chatgpt-clone
cd chatgpt-clone

Now create and activate the virtual environment

Bash
virtualenv -p /usr/bin/python3 venv
souve venv/bin/activate

Install Python Dependencies

Create a file named requirements.txt and add

Bash
flask
requests

Now install the dependencies

Bash
pip install -r requirements.txt

Ollama Server Details

If you follow the prerequisite then you will have a local version of Llama3:8b model running on your local machine. The host you will be using will be

Bash
http://localhost:11434

If you are running Ollama on different server then use the IP address of that server

11434 is the default port for Ollama server

Create the Flask Backend

Create a file named app.py which will be your main file for backend related code

Python
from flask import Flask, render_template, request, jsonify
import requests

app = Flask(__name__)

OLLAMA_API_URL = "http://192.168.0.114:11434/api/generate"

@app.route("/")
def home():
    return render_template("index.html")

@app.route("/chat", methods=["POST"])
def chat():
    user_message = request.json.get("message", "").strip()

    if not user_message:
        return jsonify({"reply": "Please enter a message."}), 400

    payload = {
        "model": "llama3:8b",
        "prompt": user_message,
        "stream": False
    }

    try:
        response = requests.post(OLLAMA_API_URL, json=payload)
        response.raise_for_status()
        ai_reply = response.json().get("response", "No response from model.")
        return jsonify({"reply": ai_reply})
    except requests.exceptions.RequestException as e:
        return jsonify({"reply": f"Error connecting to Ollama: {str(e)}"}), 500

if __name__ == "__main__":
    app.run(debug=True)

This backend does some main things

  1. Serves the chat page using the root route
    • http://localhost:5000
  2. Receives messages from the frontend
    • http://localhost:5000/chat
  3. Sends the user prompt to Ollama and returns the AI response
  4. Connects to the local Ollama server using the OLLAMA_API_URL config

This is the core connection between your web app and the AI model.

Create the Frontend UI

chatgpt clone ui on the browser

Create a file named templates/index.html

Bash
mkdir templates
cd templates
touch index.html

Put the below contents in the file

HTML
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>ChatGPT Clone</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
</head>
<body>
    <div class="chat-container">
        <h1>My ChatGPT Clone</h1>
        <div id="chat-box" class="chat-box"></div>

        <div class="input-area">
            <input type="text" id="user-input" placeholder="Type your message here..." />
            <button onclick="sendMessage()">Send</button>
        </div>
    </div>

    <script>
        async function sendMessage() {
            const input = document.getElementById("user-input");
            const chatBox = document.getElementById("chat-box");
            const message = input.value.trim();

            if (!message) return;

            chatBox.innerHTML += `<div class="message user-message"><strong>You:</strong> ${message}</div>`;
            input.value = "";

            chatBox.innerHTML += `<div class="message bot-message"><strong>AI:</strong> Thinking...</div>`;
            chatBox.scrollTop = chatBox.scrollHeight;

            const botPlaceholders = document.querySelectorAll(".bot-message");
            const currentBotMessage = botPlaceholders[botPlaceholders.length - 1];

            try {
                const response = await fetch("/chat", {
                    method: "POST",
                    headers: {
                        "Content-Type": "application/json"
                    },
                    body: JSON.stringify({ message: message })
                });

                const data = await response.json();
                currentBotMessage.innerHTML = `<strong>AI:</strong> ${data.reply}`;
            } catch (error) {
                currentBotMessage.innerHTML = `<strong>AI:</strong> Something went wrong.`;
            }

            chatBox.scrollTop = chatBox.scrollHeight;
        }
    </script>
</body>
</html>

The Fronted does these main things

  • Create a input box where user can type the message and then hit the send button
  • Message is sent to the backend server and when response comes then it is shown on th screen

Add Styling for the UI

Let’s create a new file static/style.css where we will put our styles

Bash
mkdir static
cd static
touch style.css

Put below contents in the style.css

CSS
body {
    font-family: Arial, sans-serif;
    background: #f4f7fb;
    margin: 0;
    padding: 0;
}

.chat-container {
    width: 90%;
    max-width: 700px;
    margin: 40px auto;
    background: #ffffff;
    border-radius: 16px;
    box-shadow: 0 4px 20px rgba(0, 0, 0, 0.08);
    padding: 20px;
}

h1 {
    text-align: center;
    margin-bottom: 20px;
    color: #222;
}

.chat-box {
    height: 450px;
    overflow-y: auto;
    border: 1px solid #e5e7eb;
    border-radius: 12px;
    padding: 15px;
    background: #fafafa;
    margin-bottom: 15px;
}

.message {
    padding: 12px 14px;
    margin-bottom: 12px;
    border-radius: 12px;
    line-height: 1.5;
    max-width: 85%;
}

.user-message {
    background: #dbeafe;
    margin-left: auto;
    text-align: right;
}

.bot-message {
    background: #e5e7eb;
    margin-right: auto;
    text-align: left;
}

.input-area {
    display: flex;
    gap: 10px;
}

.input-area input {
    flex: 1;
    padding: 12px;
    border: 1px solid #d1d5db;
    border-radius: 10px;
    font-size: 16px;
}

.input-area button {
    padding: 12px 20px;
    border: none;
    background: #111827;
    color: white;
    border-radius: 10px;
    cursor: pointer;
    font-size: 16px;
}

.input-area button:hover {
    opacity: 0.9;
}

Run your app and send message

Start the chatgpt clone server in the debug mode

Bash
python app.py
chatgpt clone server in debug mode

Let’s try asking a message that **Explain recursion to me**

You can see that the we got a message from the Local AI server and everything looks fine.
But there are three important limitations

  1. Code highlighting
  2. Markdown rendering
  3. Streaming Response

Streaming Response in ChatGPT Clone

Streaming responses is the right upgrade. It makes your ChatGPT clone feel way more real and gives user a better experience

What we are going to add are the below improvements

  1. Backend calls Ollama with "stream": true
  2. Flask streams chunks back to the browser
  3. Frontend reads the stream and updates the assistant message live

we will update the /chat API such that it can support streaming response

Python
from flask import Flask, render_template, request, Response, stream_with_context
import requests
import json

app = Flask(__name__)

OLLAMA_API_URL = "http://192.168.0.114:11434/api/generate"


@app.route("/")
def home():
    return render_template("index.html")


@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    user_message = data.get("message", "").strip()

    if not user_message:
        return Response("Please enter a message.", status=400, mimetype="text/plain")

    payload = {
        "model": "llama3:8b",
        "prompt": user_message,
        "stream": True
    }

    def generate():
        try:
            with requests.post(OLLAMA_API_URL, json=payload, stream=True, timeout=120) as response:
                response.raise_for_status()

                for line in response.iter_lines():
                    if not line:
                        continue

                    try:
                        chunk = json.loads(line.decode("utf-8"))
                        token = chunk.get("response", "")
                        if token:
                            yield token

                        if chunk.get("done", False):
                            break
                    except json.JSONDecodeError:
                        continue

        except requests.exceptions.RequestException as e:
            yield f"\n[Error connecting to Ollama: {str(e)}]"

    return Response(
        stream_with_context(generate()),
        mimetype="text/plain"
    )


if __name__ == "__main__":
    app.run(debug=True, threaded=True)

The important changes that we did for streaming response are

  • Enabled streaming from Ollama by setting "stream": True in the payload, so the model sends the response in chunks instead of one final response.
  • Added a generator function (generate()) to read the streamed response line by line and forward each token immediately.
  • Used Flask’s streaming Response with stream_with_context(generate()) so the backend can send data to the frontend continuously in real time.
  • Handled completion and errors properly by checking the done flag from Ollama and catching request exceptions if the Ollama server is unavailable.

We will update the index.html file to handle the streaming message. Update the sendMessage method

JavaScript
<script>
    async function sendMessage() {
        const input = document.getElementById("user-input");
        const chatBox = document.getElementById("chat-box");
        const message = input.value.trim();

        if (!message) return;

        chatBox.innerHTML += `<div class="message user-message"><strong>You:</strong> ${escapeHtml(message)}</div>`;
        input.value = "";

        const botMessage = document.createElement("div");
        botMessage.className = "message bot-message";
        botMessage.innerHTML = `<strong>AI:</strong> <span class="bot-text">Thinking...</span>`;
        chatBox.appendChild(botMessage);
        chatBox.scrollTop = chatBox.scrollHeight;

        const botText = botMessage.querySelector(".bot-text");
        botText.textContent = "";

        try {
            const response = await fetch("/chat", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json"
                },
                body: JSON.stringify({ message })
            });

            if (!response.ok) {
                botText.textContent = "Something went wrong.";
                return;
            }

            const reader = response.body.getReader();
            const decoder = new TextDecoder();

            while (true) {
                const { value, done } = await reader.read();

                if (done) break;

                const chunk = decoder.decode(value, { stream: true });
                botText.textContent += chunk;
                chatBox.scrollTop = chatBox.scrollHeight;
            }
        } catch (error) {
            botText.textContent = "Something went wrong while streaming the response.";
        }
    }

    function escapeHtml(text) {
        const div = document.createElement("div");
        div.textContent = text;
        return div.innerHTML;
    }
</script>

The Important changes that we did in the frontend are

  1. Replaced the normal wait-for-full-response logic with stream reading using response.body.getReader(), so the browser can receive the AI reply chunk by chunk.
  2. Created the assistant message box immediately with a temporary “Thinking…” state, then kept updating that same message as new text arrived.
  3. Used TextDecoder() to decode streamed chunks from the backend and append them live to the bot message, which creates the typing effect.
  4. Kept the chat UX smooth by auto-scrolling as each new chunk arrived, so the latest streamed response always stays visible.

Markdown Support and Syntax Highlighting in ChatGPT Clone

Right now your frontend probably does something like stream plain text and append text into textContent. The next big upgrade is the markdown support and syntax highlighting. we will read the response and try to add

  • Headings
  • Bullet points
  • Bold text
  • code blocks

will look much more like a real ChatGPT app.

To support markdown properly, we should:

  1. Keep collecting the streamed response in a variable
  2. Parse that full/partial text as markdown
  3. Render it into the bot message HTML
  4. Optionally highlight code blocks

we will be using the two frontend libraries **marked** and **highlight.js**

Inside your <head> add the below lines

HTML
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/highlight.js/styles/github-dark.min.css">
<script src="https://cdn.jsdelivr.net/npm/highlight.js/lib/highlight.min.js"></script>

Step 2: Update your bot message rendering logic

we will update the sendMessage method with below rendering logic

JavaScript
<script>
    async function sendMessage() {
        const input = document.getElementById("user-input");
        const chatBox = document.getElementById("chat-box");
        const message = input.value.trim();

        if (!message) return;

        chatBox.innerHTML += `
            <div class="message user-message">
                <strong>You:</strong>
                <div>${escapeHtml(message)}</div>
            </div>
        `;

        input.value = "";

        const botMessage = document.createElement("div");
        botMessage.className = "message bot-message";
        botMessage.innerHTML = `
            <strong>AI:</strong>
            <div class="bot-text">Thinking...</div>
        `;
        chatBox.appendChild(botMessage);
        chatBox.scrollTop = chatBox.scrollHeight;

        const botText = botMessage.querySelector(".bot-text");
        let fullResponse = "";
        let hasStreamedContent = false;

        try {
            const response = await fetch("/chat", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json"
                },
                body: JSON.stringify({ message })
            });

            if (!response.ok) {
                botText.textContent = "Something went wrong.";
                return;
            }

            const reader = response.body.getReader();
            const decoder = new TextDecoder();

            botText.innerHTML = "";

            while (true) {
                const { value, done } = await reader.read();
                if (done) break;

                const chunk = decoder.decode(value, { stream: true });
                if (!chunk) continue;

                fullResponse += chunk;
                hasStreamedContent = true;

                try {
                    botText.innerHTML = marked.parse(fullResponse);
                } catch (renderError) {
                    console.error("Intermediate markdown render error:", renderError);
                }

                chatBox.scrollTop = chatBox.scrollHeight;
            }

            const remaining = decoder.decode();
            if (remaining) {
                fullResponse += remaining;
            }

            botText.innerHTML = marked.parse(fullResponse);

            botText.querySelectorAll("pre code").forEach((block) => {
                hljs.highlightElement(block);
            });

            chatBox.scrollTop = chatBox.scrollHeight;

        } catch (error) {
            console.error("Streaming error:", error);

            if (!hasStreamedContent) {
                botText.textContent = "Something went wrong while streaming the response.";
            }
        }
    }

    function escapeHtml(text) {
        const div = document.createElement("div");
        div.textContent = text;
        return div.innerHTML;
    }
</script>

Step 3: Improve CSS for markdown content

we will also need to improve our UI to make our ChatGPT clone look more professional. In your style.css add the below lines

CSS
.bot-text {
    line-height: 1.6;
    word-wrap: break-word;
}

.bot-text p {
    margin: 0 0 12px;
}

.bot-text ul,
.bot-text ol {
    padding-left: 20px;
    margin: 10px 0;
}

.bot-text li {
    margin-bottom: 6px;
}

.bot-text pre {
    background: #111827;
    color: #f9fafb;
    padding: 12px;
    border-radius: 10px;
    overflow-x: auto;
    margin: 12px 0;
}

.bot-text code {
    font-family: Consolas, Monaco, monospace;
    font-size: 14px;
}

.bot-text p code,
.bot-text li code {
    background: #f3f4f6;
    color: #111827;
    padding: 2px 6px;
    border-radius: 6px;
}

.bot-text h1,
.bot-text h2,
.bot-text h3 {
    margin: 16px 0 10px;
    color: #111827;
}

.bot-text blockquote {
    border-left: 4px solid #d1d5db;
    margin: 12px 0;
    padding-left: 12px;
    color: #4b5563;
}

You will see the final rendering something like this

Markdown syntax highlighting in the chagpt clone

Next Improvements

This ChatGPT clone is a great starting point, but there are several ways you can take it to the next level and make it feel more like a production ready AI application.

Add Conversation Memory

Right now, the chatbot responds only to the current message. A useful next step is to add conversation memory so the assistant can remember previous messages and answer in context.

Support Multiple Chat Sessions

A real ChatGPT-style application usually allows users to create multiple conversations. You can improve this project by adding separate chat sessions and switching between them.

FAQ

What is a ChatGPT clone?

A ChatGPT clone is an application that mimics the basic functionality of ChatGPT. It usually includes a chat interface, a backend API, and an AI model that generates responses based on user input.

Can I build a ChatGPT clone without using the OpenAI API?

Yes. In this project, we use Ollama and a local LLM such as Llama 3, so you do not need the OpenAI API to build and run the chatbot.

Is this ChatGPT clone running locally?

Yes. The chatbot runs locally on your machine. The frontend, Flask backend, and Ollama model all work on your local system.

Why did we use Flask for this project?

Flask is lightweight, beginner friendly, and easy to set up. It is a good choice for learning how a basic AI web app works.

References

Leave a Reply

Your email address will not be published. Required fields are marked *