Build a LLM application with OpenAI React

Welcome to part 2 of our series on building a full-stack LLM application using OpenAI, Flask, React, and Pinecone. You can find part 1 here

Welcome to part 2 of our series on building a full-stack LLM application using OpenAI, Flask, React, and Pinecone. You can find part 1 here (this post assumes you’ve already completed part 1 and have a functioning Flask backend).

In this post, we'll build the front-end chat interface for our application using React.

Build a LLM application with OpenAI React


What we're building

In this series, our journey will take us through the creation of a simple web application that allows a user to input a URL and ask questions about the content on that webpage. We've already built out the Flask (Python) backend in part 1, so we're focusing here on building the front-end.


Components of the frontend application:

  • URL input: The user can enter any URL into the UI of the web application. The backend will embed the contents of that webpage and index them in a vector database.
  • Chat interface: The user sees a ChatGPT-style interface to ask questions about the content of the website and get answers from OpenAI’s GPT-4. The frontend also sends the entire chat history to the backend to give GPT-4 context on the current conversation.
  • Streaming responses: Instead of making the user wait for GPT-4 to generate a full response, we’ll stream its response token-by-token to the front-end as it’s being generated.

Step 1: Setting up App.js

Creating the React app

To make things simple, we'll use the Create React App package. Run the following in your terminal, one command at a time:
 
    # Enter the root project directory (where your Python backend is)
    cd ./YourApp

    # Install the Create React App package globally
    npm install -g create-react-app
    
    # Create the React app in a folder named 'client'
    npx create-react-app client
    

After a few seconds, your app will be created!


Install react-markdown

To make the chat interface simpler for the user, we'll be rendering the responses from GPT-4 in Markdown (we've specified in our prompt to GPT-4 to ensure that it returns responses in Markdown format). We'll be using the react-markdown library to handle this. Type the following into your terminal:

    # Enter the client directory
    cd client

    # Install react-markdown
    npm install --save react-markdown
    

Modify folder structure and copy in CSS

We're going to modify the React app's folder structure slightly to make it more organized. Make sure your structure looks like this (create the empty files for ones that aren’t there yet):

/YourApp
    /app  # from part 1
    /client
        /public # keep this as is
        /src
            /components # new folder
                ChatInterface.js # create empty files
                ChatMessage.js
                UrlInput.js
            /styles
                styles.css  # copy this file from GitHub
            App.js
            index.js
        .gitignore
        package.json
    .env
    .gitignore
    requirements.txt
    run.py

Copy in the contents of the style.css file from the GitHub source code.

Step 2: Creating the App.js logic

To keep things simple, our App.js file will be the main “controller” for the application. Since we have two screens (the URL input screen and the chat interface), we'll implement logic inside App.js to handle showing the right component to the user:

    // YourApp/client/src/App.js

    import React, { useState, useEffect } from 'react';
    import UrlInput from './components/UrlInput';
    import ChatInterface from './components/ChatInterface';
    
    function App() {
      const [showChat, setShowChat] = useState(false); // Add state to control UI transition
      const handleUrlSubmitted = () => {
        setShowChat(true); // Transition to the ChatInterface
      };
    
      // Delete the Pinecone index when the user leaves the page
      useEffect(() => {
        return () => {
          fetch('http://localhost:5000/delete-index', {
            method: 'POST',
          })
            .then((response) => {
              if (!response.ok) {
                console.error('Error deleting index:', response.statusText);
              }
            })
            .catch((error) => {
              console.error('Error:', error);
            });
        };
      }, []);
    
      return (
        <div className="App">
          {!showChat ? (
            <UrlInput onSubmit={handleUrlSubmitted} />
          ) : (
            <ChatInterface />
          )}
        </div>
      );
    }
    
    export default App;
    

First, we'll use a simple showChat state variable to determine which component to show the user. When the user submits the URL on the UrlInput.js component, the app will automatically switch to showing the ChatInterface.js component.

Deleting the Pinecone index upon leaving the page

You'll notice a useEffect hook that runs whenever the App.js component unmounts (when the user leaves or refreshes the page). Since we're using the free tier on Pinecone (this only allows one index) we'll want to delete this and create it again for every new page visit. This is not how you'd handle this in production but we're aiming for simplicity here!

Let’s set up the delete-index route on the Flask server to enable this functionality:

    # YourApp/api/routes.py

    # (Keep existing code and add the following)
    @api_blueprint.route('/delete-index', methods=['POST'])
    def delete_index():
        pinecone_service.delete_index(PINECONE_INDEX_NAME)
        return jsonify({"message": f"Index {PINECONE_INDEX_NAME} deleted successfully"})
    

Now let’s finish the implementation of the delete_index function within pinecone_service.py:

    # YourApp/services/pinecone_service.py

    # (Keep existing code and add the following)
    def delete_index(index_name):
      if index_name in pinecone.list_indexes():
        pinecone.delete_index(name=index_name)
    

Step 3: Handling the user's URL input

We’re finally ready for the meat of the application: taking a URL as input from the user and getting our app ready to answer the user’s questions about the content of that webpage!

Copy the following into your UrlInput.js file:

    // src/components/UrlInput.js

    import React, { useState } from 'react';

    function UrlInput({ onSubmit }) {
      const [url, setUrl] = useState('');
      const [loading, setLoading] = useState(false);
      const [responseMessage, setResponseMessage] = useState('');
    
      const handleSubmit = async (e) => {
        e.preventDefault();
        setLoading(true);
        try {
          const response = await fetch('http://localhost:5000/embed-and-store', {
            method: 'POST',
            headers: {
              'Content-Type': 'application/json',
            },
            body: JSON.stringify({ url }),
          });
          if (response.ok) {
            const data = await response.json();
            setResponseMessage(data.message);
            onSubmit();
          } else {
            setResponseMessage('Error: Something went wrong.');
          }
        } catch (error) {
          console.error('Error:', error);
          setResponseMessage('Error: Something went wrong.');
        } finally {
          setLoading(false);
        }
      };
    
      return (
        <div className="urlinput">
          <form onSubmit={handleSubmit}>
            <input
              type="text"
              placeholder="Enter a URL"
              value={url}
              onChange={(e) => setUrl(e.target.value)}
              required
            />
            <button type="submit" disabled={loading}>
              {loading ? 'Building Index...' : 'Submit'}
            </button>
          </form>
          {responseMessage && <p>{responseMessage}</p>}
        </div>
      );
    }
    
    export default UrlInput;
    

The core of the code is in the handleSubmit method. Here, we’re calling the embed-and-store endpoint on our Flask server and providing it the URL from the user. Note that in production we’ll want to handle validating the URL before we send it to the server!

Once we hit the endpoint, we’ll need to wait several seconds for the backend to scrape the url’s contents, embed them using OpenAI’s Embeddings API, and store them in a Pinecone index. The user sees a loading state until the backend finishes, at which point it calls the onSubmit function provided by App.js to display the ChatInterface.js component to the user.

Step 4: Create the chat interface UI

Now our chatbot is ready to take questions from the user! Copy the following into ChatInterface.js:

    // src/components/ChatInterface.js

    import React, { useState, useEffect, useRef } from 'react';
    import ChatMessage from './ChatMessage';
    
    function ChatInterface() {
      const [messages, setMessages] = useState([]);
      const [inputText, setInputText] = useState('');
      const messagesEndRef = useRef(null);
      const scrollToBottom = () => {
        messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
      };
    
      useEffect(scrollToBottom, [messages]);
    
      const handleSendMessage = async (event) => {
        event.preventDefault();
        if (!inputText.trim()) return; // Prevent sending empty messages
        const userMessage = { text: inputText, isBot: false };
        const body = {
          chatHistory: [...messages, userMessage],
          question: inputText,
        }    
        setMessages([...messages, userMessage]);
        setInputText('');
        const response = await fetch('http://localhost:5000/handle-query', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify(body),
        });
        const data = await response.json();
        const botMessage = { text: data.answer, isBot: true };
        setMessages(currentMessages => [...currentMessages, botMessage]);
      };
    
      return (
        <div className="chat-container">
          <header className="chat-header">URL Question & Answer</header>
          {
            messages.length === 0 
              && 
            <div className="chat-message bot-message">
              <p className="initial-message">Hi there! I'm a bot trained to answer questions about the URL you entered. Try asking me a question below!</p>
            </div>
          }
          <div className="chat-messages">
            {messages.map((message, index) => (
              <ChatMessage key={index} message={message} />
            ))}
            <div ref={messagesEndRef} />
          </div>
          <form className="chat-input" onSubmit={handleSendMessage}>
            <input
              type="text"
              placeholder="Type a question and press enter ..."
              value={inputText}
              onChange={(e) => setInputText(e.target.value)}
            />
          </form>
        </div>
      );
    }
    
    export default ChatInterface;
    

Notice that we have a messages state variable to store the history of our chat messages (both the user's and the bot's). This is necessary for us to (1) display the entire chat history to the user (mimicking the ChatGPT interface, and (2) provide the entire chat history to the backend to use as part of its payload to GPT-4 (we'll cover this in the next section).

Our handleSendMessage function fires once the user inputs a question. It hits the handle-query endpoint which does its magic (see part 1) and adds the bot’s response to the messages array once it’s fully generated. This is all rendered to the front-end.

Make sure to paste the following into the ChatMessage.js file:

    // src/components/ChatMessage.js

    import React from 'react';
    import ReactMarkdown from 'react-markdown';
    
    function ChatMessage({ message }) {
      // Only parse markdown for bot messages
      const content = message.isBot ? (
        <ReactMarkdown children={message.text} />
      ) : (
        message.text
      );
    
      return (
        <div className={`chat-message ${message.isBot ? 'bot-message' : 'user-message'}`}>
          {content}
        </div>
      );
    }
    
    export default ChatMessage;
    

Modifying the backend to handle the entire chat history

In part 1, we made a simple call to OpenAI’s ChatCompletions endpoint with only the current prompt to get an answer. This meant that GPT-4 didn’t have context on the entire user conversation; to give it this context, we’ll be sending the entire chat history — provided by the frontend — to the ChatCompletions endpoint:

    # YourApp/services/openai_service.py

    def get_llm_answer(prompt, chat_history):
  
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    # Pass in the entire chat history
  
    for message in chat_history:
      if message['isBot']:
        messages.append({"role": "system", "content": message["text"]})
      else:
        messages.append({"role": "user", "content": message["text"]})
  
    # Replace last message with the full prompt
    messages[-1]["content"] = prompt
  
    url = 'https://api.openai.com/v1/chat/completions'
    headers = {
        'content-type': 'application/json; charset=utf-8',
        'Authorization': f"Bearer {OPENAI_API_KEY}"            
    }
    data = {
        'model': CHATGPT_MODEL,
        'messages': messages,
        'temperature': 1, 
        'max_tokens': 1000
    }
    response = requests.post(url, headers=headers, data=json.dumps(data))
    response_json = response.json()
    completion = response_json["choices"][0]["message"]["content"]
    return completion
  

Here, we're looping through the entire chat history to populate the messages array and we send this to the model so it has context on both the conversation and the retrieved chunks of text.

Step 5: Streaming the response from GPT-4


One wrinkle in our current setup is that the frontend waits for GPT-4 to generate its entire answer before it displays this to the user. Instead of making the user wait — and to mimic the ChatGPT interface — we’ll implement streaming so the answer is displayed to the user as each token is generated.

Modifying the backend to handle streaming

First, let’s add a new function inside helper_functions.py in the backend. We’re now abstracting away all the messages-building code away from the openai_service.py file:

    # YourApp/utils/helper_functions.py

    # (keep current code in the file and add the following)
    def construct_messages_list(chat_history, prompt):
        messages = [{"role": "system", "content": "You are a helpful assistant."}]
        
        # Populate the messages array with the current chat history
        for message in chat_history:
            if message['isBot']:
                messages.append({"role": "system", "content": message["text"]})
            else:
                messages.append({"role": "user", "content": message["text"]})
        # Replace last message with the full prompt
        messages[-1]["content"] = prompt    
        
        return messages
    

Now we’ll replace the get_llm_answer function inside openai_service.py with the following. This functions handles creating the headers and data to send as part of the payload to OpenAI’s ChatCompletions endpoint:

    # YourApp/services/openai_service.py

    # (keep current code and replace get_llm_answer with the following)
    def construct_llm_payload(question, context_chunks, chat_history):
      
      # Build the prompt with the context chunks and user's query
      prompt = build_prompt(question, context_chunks)
      print("\n==== PROMPT ====\n")
      print(prompt)
    
      # Construct messages array to send to OpenAI
      messages = construct_messages_list(chat_history, prompt)
    
      # Construct headers including the API key
      headers = {
          'content-type': 'application/json; charset=utf-8',
          'Authorization': f"Bearer {OPENAI_API_KEY}"            
      }  
    
      # Construct data payload
      data = {
          'model': CHATGPT_MODEL,
          'messages': messages,
          'temperature': 1, 
          'max_tokens': 1000,
          'stream': True
      }
    
      return headers, data
    

Finally, let’s replace the handle_query route logic in routes.py to handle streaming the response back to the front-end:

    # YourApp/api/routes.py (entire file)

    from . import api_blueprint
    import os
    from flask import request, jsonify, Response, stream_with_context, json
    import requests
    import sseclient
    from app.services import openai_service, pinecone_service, scraping_service
    from app.utils.helper_functions import chunk_text, build_prompt, construct_messages_list
    
    PINECONE_INDEX_NAME = 'index237'
    
    @api_blueprint.route('/handle-query', methods=['POST'])
    def handle_query():
        question = request.json['question']
        chat_history = request.json['chatHistory']
        
        # Get the most similar chunks from Pinecone
        context_chunks = pinecone_service.get_most_similar_chunks_for_query(question, PINECONE_INDEX_NAME)
        
        # Build the payload to send to OpenAI
        headers, data = openai_service.construct_llm_payload(question, context_chunks, chat_history)
    
        # Send to OpenAI's LLM to generate a completion
        def generate():
            url = 'https://api.openai.com/v1/chat/completions'
            response = requests.post(url, headers=headers, data=json.dumps(data), stream=True)
            client = sseclient.SSEClient(response)
            for event in client.events():
                if event.data != '[DONE]':
                    try:
                        text = json.loads(event.data)['choices'][0]['delta']['content']
                        yield(text)
                    except:
                        yield('')
        
        # Return the streamed response from the LLM to the frontend
        return Response(stream_with_context(generate()))
    
    @api_blueprint.route('/embed-and-store', methods=['POST'])
    def embed_and_store():
        url = request.json['url']
        url_text = scraping_service.scrape_website(url)
        chunks = chunk_text(url_text)
        pinecone_service.embed_chunks_and_upload_to_pinecone(chunks, PINECONE_INDEX_NAME)
        response_json = {
            "message": "Chunks embedded and stored successfully"
        }
        return jsonify(response_json)
    
    @api_blueprint.route('/delete-index', methods=['POST'])
    def delete_index():
        pinecone_service.delete_index(PINECONE_INDEX_NAME)
        return jsonify({"message": f"Index {PINECONE_INDEX_NAME} deleted successfully"})
    

Make sure that your requirements.txt file now looks like this and re-run pip install -r requirements.txt in your terminal. This will install the sseclient library that we'll need to handle the streaming.

    # YourApp/requirements.txt

    flask
    requests
    beautifulsoup4
    openai
    pinecone-client
    flask-cors
    python-dotenv
    gunicorn
    sseclient-py  # new package
    

Modify the frontend to handle streaming

Now our backend will stream the response from GPT-4 to the frontend one token at a time! Let’s make sure to handle this in the ChatInterface.js file:

    // src/components/ChatInterface.js

    // (Keep the file contents and just replace the handleSendMessage function

    const handleSendMessage = async (event) => {
        event.preventDefault();
    
        if (!inputText.trim()) return; 
    
        const userMessage = { text: inputText, isBot: false };
        const body = {
          chatHistory: [...messages, userMessage],
          question: inputText,
        }    
    
        // Add a new empty bot message to the UI
        const botMessage = { text: '', isBot: true };
        setMessages([...messages, userMessage, botMessage]);
        setInputText('');
    
        // Send the user's message to the server and wait for a response.
        // This response will be streamed to this component.
        const response = await fetch('http://localhost:5000/handle-query', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify(body),
        });
        if (!response.body) return;
    
        // Set up the infrastructure to stream the response data
        let decoder = new TextDecoderStream();
        const reader = response.body.pipeThrough(decoder).getReader()    
        let accumulatedAnswer = ""
    
        while (true) {
          var { value, done } = await reader.read();
          if (done) break;
          accumulatedAnswer += value;
          setMessages(currentHistory => {
            const updatedHistory = [...currentHistory]
            const lastChatIndex = updatedHistory.length - 1
            updatedHistory[lastChatIndex] = {
              ...updatedHistory[lastChatIndex],
              text: accumulatedAnswer
            }
            return updatedHistory
          })
        }
      };
    

Here we're using the TextDecoderStream pattern to read the response from the backend one token at a time. As we get each token, we update the messages array with the accumulated text of the answer so the response is shown to the user one token at a time!

Conclusion and what's next


We've built a full-stack web application with OpenAI, Pinecone, Flask, and React!

To try out the full app, run python run.py in your root folder and in a separate terminal window cd into your client finder and run npm start. This should open up the web UI in a new browser tab where you can try the app out yourself!

You can find the source code for this tutorial at this branch on GitHub. In the next post, we’ll set our app up for production by adding tracking and evaluation to our application to make sure we’re iterating on our prompts properly based on user feedback!

Post a Comment