Welcome to part 2 of our series on building a full-stack LLM application using OpenAI, Flask, React, and Pinecone. You can find part 1 here (this post assumes you’ve already completed part 1 and have a functioning Flask backend).
In this post, we'll build the front-end chat interface for our application using React.
What we're building
In this series, our journey will take us through the creation of a simple web application that allows a user to input a URL and ask questions about the content on that webpage. We've already built out the Flask (Python) backend in part 1, so we're focusing here on building the front-end.
Components of the frontend application:
- URL input: The user can enter any URL into the UI of the web application. The backend will embed the contents of that webpage and index them in a vector database.
- Chat interface: The user sees a ChatGPT-style interface to ask questions about the content of the website and get answers from OpenAI’s GPT-4. The frontend also sends the entire chat history to the backend to give GPT-4 context on the current conversation.
- Streaming responses: Instead of making the user wait for GPT-4 to generate a full response, we’ll stream its response token-by-token to the front-end as it’s being generated.
Step 1: Setting up App.js
Creating the React app
To make things simple, we'll use the Create React App package. Run the
following in your terminal, one command at a time:
# Enter the root project directory (where your Python backend is) cd ./YourApp # Install the Create React App package globally npm install -g create-react-app # Create the React app in a folder named 'client' npx create-react-app client
After a few seconds, your app will be created!
Install react-markdown
To make the chat interface simpler for the user, we'll be rendering the
responses from GPT-4 in Markdown (we've specified in our prompt to GPT-4 to
ensure that it returns responses in Markdown format). We'll be using the
react-markdown library to handle this. Type the following into your
terminal:
# Enter the client directory cd client # Install react-markdown npm install --save react-markdown
Modify folder structure and copy in CSS
We're going to modify the React app's folder structure slightly to make it
more organized. Make sure your structure looks like this (create the empty
files for ones that aren’t there yet):
/YourApp /app # from part 1 /client /public # keep this as is /src /components # new folder ChatInterface.js # create empty files ChatMessage.js UrlInput.js /styles styles.css # copy this file from GitHub App.js index.js .gitignore package.json .env .gitignore requirements.txt run.py
Copy in the contents of the style.css file from the GitHub source code.
Step 2: Creating the App.js logic
To keep things simple, our App.js file will be the main “controller” for the
application. Since we have two screens (the URL input screen and the chat
interface), we'll implement logic inside App.js to handle showing the right
component to the user:
// YourApp/client/src/App.js import React, { useState, useEffect } from 'react'; import UrlInput from './components/UrlInput'; import ChatInterface from './components/ChatInterface'; function App() { const [showChat, setShowChat] = useState(false); // Add state to control UI transition const handleUrlSubmitted = () => { setShowChat(true); // Transition to the ChatInterface }; // Delete the Pinecone index when the user leaves the page useEffect(() => { return () => { fetch('http://localhost:5000/delete-index', { method: 'POST', }) .then((response) => { if (!response.ok) { console.error('Error deleting index:', response.statusText); } }) .catch((error) => { console.error('Error:', error); }); }; }, []); return ( <div className="App"> {!showChat ? ( <UrlInput onSubmit={handleUrlSubmitted} /> ) : ( <ChatInterface /> )} </div> ); } export default App;
First, we'll use a simple showChat state variable to determine which
component to show the user. When the user submits the URL on the UrlInput.js
component, the app will automatically switch to showing the ChatInterface.js
component.
Deleting the Pinecone index upon leaving the page
You'll notice a useEffect hook that runs whenever the App.js component
unmounts (when the user leaves or refreshes the page). Since we're using the
free tier on Pinecone (this only allows one index) we'll want to delete this
and create it again for every new page visit. This is not how you'd handle
this in production but we're aiming for simplicity here!
Let’s set up the delete-index route on the Flask server to enable this
functionality:
# YourApp/api/routes.py # (Keep existing code and add the following) @api_blueprint.route('/delete-index', methods=['POST']) def delete_index(): pinecone_service.delete_index(PINECONE_INDEX_NAME) return jsonify({"message": f"Index {PINECONE_INDEX_NAME} deleted successfully"})
Now let’s finish the implementation of the delete_index function within
pinecone_service.py:
# YourApp/services/pinecone_service.py # (Keep existing code and add the following) def delete_index(index_name): if index_name in pinecone.list_indexes(): pinecone.delete_index(name=index_name)
Step 3: Handling the user's URL input
We’re finally ready for the meat of the application: taking a URL as input
from the user and getting our app ready to answer the user’s questions about
the content of that webpage!
Copy the following into your UrlInput.js file:
// src/components/UrlInput.js import React, { useState } from 'react'; function UrlInput({ onSubmit }) { const [url, setUrl] = useState(''); const [loading, setLoading] = useState(false); const [responseMessage, setResponseMessage] = useState(''); const handleSubmit = async (e) => { e.preventDefault(); setLoading(true); try { const response = await fetch('http://localhost:5000/embed-and-store', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ url }), }); if (response.ok) { const data = await response.json(); setResponseMessage(data.message); onSubmit(); } else { setResponseMessage('Error: Something went wrong.'); } } catch (error) { console.error('Error:', error); setResponseMessage('Error: Something went wrong.'); } finally { setLoading(false); } }; return ( <div className="urlinput"> <form onSubmit={handleSubmit}> <input type="text" placeholder="Enter a URL" value={url} onChange={(e) => setUrl(e.target.value)} required /> <button type="submit" disabled={loading}> {loading ? 'Building Index...' : 'Submit'} </button> </form> {responseMessage && <p>{responseMessage}</p>} </div> ); } export default UrlInput;
The core of the code is in the handleSubmit method. Here, we’re calling the
embed-and-store endpoint on our Flask server and providing it the URL from
the user. Note that in production we’ll want to handle validating the URL
before we send it to the server!
Once we hit the endpoint, we’ll need to wait several seconds for the backend
to scrape the url’s contents, embed them using OpenAI’s Embeddings API, and
store them in a Pinecone index. The user sees a loading state until the
backend finishes, at which point it calls the onSubmit function provided by
App.js to display the ChatInterface.js component to the user.
Step 4: Create the chat interface UI
Now our chatbot is ready to take questions from the user! Copy the following
into ChatInterface.js:
// src/components/ChatInterface.js import React, { useState, useEffect, useRef } from 'react'; import ChatMessage from './ChatMessage'; function ChatInterface() { const [messages, setMessages] = useState([]); const [inputText, setInputText] = useState(''); const messagesEndRef = useRef(null); const scrollToBottom = () => { messagesEndRef.current?.scrollIntoView({ behavior: "smooth" }); }; useEffect(scrollToBottom, [messages]); const handleSendMessage = async (event) => { event.preventDefault(); if (!inputText.trim()) return; // Prevent sending empty messages const userMessage = { text: inputText, isBot: false }; const body = { chatHistory: [...messages, userMessage], question: inputText, } setMessages([...messages, userMessage]); setInputText(''); const response = await fetch('http://localhost:5000/handle-query', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(body), }); const data = await response.json(); const botMessage = { text: data.answer, isBot: true }; setMessages(currentMessages => [...currentMessages, botMessage]); }; return ( <div className="chat-container"> <header className="chat-header">URL Question & Answer</header> { messages.length === 0 && <div className="chat-message bot-message"> <p className="initial-message">Hi there! I'm a bot trained to answer questions about the URL you entered. Try asking me a question below!</p> </div> } <div className="chat-messages"> {messages.map((message, index) => ( <ChatMessage key={index} message={message} /> ))} <div ref={messagesEndRef} /> </div> <form className="chat-input" onSubmit={handleSendMessage}> <input type="text" placeholder="Type a question and press enter ..." value={inputText} onChange={(e) => setInputText(e.target.value)} /> </form> </div> ); } export default ChatInterface;
Notice that we have a messages state variable to store the history of our
chat messages (both the user's and the bot's). This is necessary for us to
(1) display the entire chat history to the user (mimicking the ChatGPT
interface, and (2) provide the entire chat history to the backend to use as
part of its payload to GPT-4 (we'll cover this in the next section).
Our handleSendMessage function fires once the user inputs a question. It
hits the handle-query endpoint which does its magic (see part 1) and adds
the bot’s response to the messages array once it’s fully generated. This is
all rendered to the front-end.
Make sure to paste the following into the ChatMessage.js file:
// src/components/ChatMessage.js import React from 'react'; import ReactMarkdown from 'react-markdown'; function ChatMessage({ message }) { // Only parse markdown for bot messages const content = message.isBot ? ( <ReactMarkdown children={message.text} /> ) : ( message.text ); return ( <div className={`chat-message ${message.isBot ? 'bot-message' : 'user-message'}`}> {content} </div> ); } export default ChatMessage;
Modifying the backend to handle the entire chat history
In part 1, we made a simple call to OpenAI’s ChatCompletions endpoint with
only the current prompt to get an answer. This meant that GPT-4 didn’t have
context on the entire user conversation; to give it this context, we’ll be
sending the entire chat history — provided by the frontend — to the
ChatCompletions endpoint:
# YourApp/services/openai_service.py def get_llm_answer(prompt, chat_history): messages = [{"role": "system", "content": "You are a helpful assistant."}] # Pass in the entire chat history for message in chat_history: if message['isBot']: messages.append({"role": "system", "content": message["text"]}) else: messages.append({"role": "user", "content": message["text"]}) # Replace last message with the full prompt messages[-1]["content"] = prompt url = 'https://api.openai.com/v1/chat/completions' headers = { 'content-type': 'application/json; charset=utf-8', 'Authorization': f"Bearer {OPENAI_API_KEY}" } data = { 'model': CHATGPT_MODEL, 'messages': messages, 'temperature': 1, 'max_tokens': 1000 } response = requests.post(url, headers=headers, data=json.dumps(data)) response_json = response.json() completion = response_json["choices"][0]["message"]["content"] return completion
Here, we're looping through the entire chat history to populate the messages
array and we send this to the model so it has context on both the conversation
and the retrieved chunks of text.
Step 5: Streaming the response from GPT-4
One wrinkle in our current setup is that the frontend waits for GPT-4 to
generate its entire answer before it displays this to the user. Instead of
making the user wait — and to mimic the ChatGPT interface — we’ll implement
streaming so the answer is displayed to the user as each token is generated.
Modifying the backend to handle streaming
First, let’s add a new function inside helper_functions.py in the backend.
We’re now abstracting away all the messages-building code away from the
openai_service.py file:
# YourApp/utils/helper_functions.py # (keep current code in the file and add the following) def construct_messages_list(chat_history, prompt): messages = [{"role": "system", "content": "You are a helpful assistant."}] # Populate the messages array with the current chat history for message in chat_history: if message['isBot']: messages.append({"role": "system", "content": message["text"]}) else: messages.append({"role": "user", "content": message["text"]}) # Replace last message with the full prompt messages[-1]["content"] = prompt return messages
Now we’ll replace the get_llm_answer function inside openai_service.py with
the following. This functions handles creating the headers and data to send as
part of the payload to OpenAI’s ChatCompletions endpoint:
# YourApp/services/openai_service.py # (keep current code and replace get_llm_answer with the following) def construct_llm_payload(question, context_chunks, chat_history): # Build the prompt with the context chunks and user's query prompt = build_prompt(question, context_chunks) print("\n==== PROMPT ====\n") print(prompt) # Construct messages array to send to OpenAI messages = construct_messages_list(chat_history, prompt) # Construct headers including the API key headers = { 'content-type': 'application/json; charset=utf-8', 'Authorization': f"Bearer {OPENAI_API_KEY}" } # Construct data payload data = { 'model': CHATGPT_MODEL, 'messages': messages, 'temperature': 1, 'max_tokens': 1000, 'stream': True } return headers, data
Finally, let’s replace the handle_query route logic in routes.py to handle
streaming the response back to the front-end:
# YourApp/api/routes.py (entire file) from . import api_blueprint import os from flask import request, jsonify, Response, stream_with_context, json import requests import sseclient from app.services import openai_service, pinecone_service, scraping_service from app.utils.helper_functions import chunk_text, build_prompt, construct_messages_list PINECONE_INDEX_NAME = 'index237' @api_blueprint.route('/handle-query', methods=['POST']) def handle_query(): question = request.json['question'] chat_history = request.json['chatHistory'] # Get the most similar chunks from Pinecone context_chunks = pinecone_service.get_most_similar_chunks_for_query(question, PINECONE_INDEX_NAME) # Build the payload to send to OpenAI headers, data = openai_service.construct_llm_payload(question, context_chunks, chat_history) # Send to OpenAI's LLM to generate a completion def generate(): url = 'https://api.openai.com/v1/chat/completions' response = requests.post(url, headers=headers, data=json.dumps(data), stream=True) client = sseclient.SSEClient(response) for event in client.events(): if event.data != '[DONE]': try: text = json.loads(event.data)['choices'][0]['delta']['content'] yield(text) except: yield('') # Return the streamed response from the LLM to the frontend return Response(stream_with_context(generate())) @api_blueprint.route('/embed-and-store', methods=['POST']) def embed_and_store(): url = request.json['url'] url_text = scraping_service.scrape_website(url) chunks = chunk_text(url_text) pinecone_service.embed_chunks_and_upload_to_pinecone(chunks, PINECONE_INDEX_NAME) response_json = { "message": "Chunks embedded and stored successfully" } return jsonify(response_json) @api_blueprint.route('/delete-index', methods=['POST']) def delete_index(): pinecone_service.delete_index(PINECONE_INDEX_NAME) return jsonify({"message": f"Index {PINECONE_INDEX_NAME} deleted successfully"})
Make sure that your requirements.txt file now looks like this and re-run pip
install -r requirements.txt in your terminal. This will install the sseclient
library that we'll need to handle the streaming.
# YourApp/requirements.txt flask requests beautifulsoup4 openai pinecone-client flask-cors python-dotenv gunicorn sseclient-py # new package
Modify the frontend to handle streaming
Now our backend will stream the response from GPT-4 to the frontend one
token at a time! Let’s make sure to handle this in the ChatInterface.js
file:
// src/components/ChatInterface.js // (Keep the file contents and just replace the handleSendMessage function const handleSendMessage = async (event) => { event.preventDefault(); if (!inputText.trim()) return; const userMessage = { text: inputText, isBot: false }; const body = { chatHistory: [...messages, userMessage], question: inputText, } // Add a new empty bot message to the UI const botMessage = { text: '', isBot: true }; setMessages([...messages, userMessage, botMessage]); setInputText(''); // Send the user's message to the server and wait for a response. // This response will be streamed to this component. const response = await fetch('http://localhost:5000/handle-query', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(body), }); if (!response.body) return; // Set up the infrastructure to stream the response data let decoder = new TextDecoderStream(); const reader = response.body.pipeThrough(decoder).getReader() let accumulatedAnswer = "" while (true) { var { value, done } = await reader.read(); if (done) break; accumulatedAnswer += value; setMessages(currentHistory => { const updatedHistory = [...currentHistory] const lastChatIndex = updatedHistory.length - 1 updatedHistory[lastChatIndex] = { ...updatedHistory[lastChatIndex], text: accumulatedAnswer } return updatedHistory }) } };
Here we're using the TextDecoderStream pattern to read the response from the
backend one token at a time. As we get each token, we update the messages
array with the accumulated text of the answer so the response is shown to
the user one token at a time!
Conclusion and what's next
We've built a full-stack web application with OpenAI, Pinecone, Flask, and
React!
To try out the full app, run python run.py in your root folder and in a
separate terminal window cd into your client finder and run npm start. This
should open up the web UI in a new browser tab where you can try the app out
yourself!
You can find the source code for this tutorial at this branch on GitHub. In
the next post, we’ll set our app up for production by adding tracking and
evaluation to our application to make sure we’re iterating on our prompts
properly based on user feedback!