Skip to content

How to Train ChatGPT for Business and Personal Use

Updated on

Artificial Intelligence (AI) has revolutionized numerous sectors, and the field of communication is no exception. Among the various AI models available, ChatGPT, developed by OpenAI, stands out due to its impressive language understanding capabilities. This guide aims to provide a comprehensive understanding of how to train ChatGPT effectively with custom data on MacOS.

Building your own AI-powered chatbot has never been easier. With OpenAI’s ChatGPT, you can train a language model using custom data tailored to your specific needs. By the end of this guide, you will have a working knowledge of how to set up, prepare your data, and fine-tune your chatbot.

Method 1: Use Online Tools to Train ChatGPT

Using CustomGPT.ai for Training ChatGPT

CustomGPT.ai is an online platform that simplifies the process of training ChatGPT with your own data. It eliminates the need for coding or database work, making it accessible for users without a technical background. Here are the steps to use CustomGPT.ai:

  1. Visit CustomGPT.ai (opens in a new tab) and create a new project. You can name it according to your preference.
  2. Input your website's sitemap URL. The platform will then queue up all the pages from your sitemap for crawling.
  3. Wait for the system to crawl all your pages. This process may take up to an hour depending on the number of pages on your website.
  4. Once the crawling process is complete, you can start creating your customized chatbot. The chatbot will have read all the pages, understood the content, and will be ready to interact with anyone who uses it.

Using ChatGPT School for Training ChatGPT

ChatGPT School is another platform that allows you to train ChatGPT with your own data. It's particularly useful for educational content, such as online courses. Here's how to use it:

  1. Visit ChatGPT School (opens in a new tab) and create a new project.
  2. Like CustomGPT.ai, you'll need to input your website's sitemap URL. The platform will then queue up all the pages from your sitemap for crawling.
  3. Wait for the system to crawl all your pages. This process may take up to an hour depending on the number of pages on your website.
  4. Once the crawling process is complete, you can start creating your customized chatbot. The chatbot will have read all the pages, understood the content, and will be ready to interact with anyone who uses it.

Method 2: Build Your Own Customized LLM to Train ChatGPT

Prepare the Environment to Train ChatGPT

Step 1: Install Python

Python 3.0+ is required to start. Before jumping into installation, it's recommended to check if you already have Python3 installed. You can do this by running the following command in your terminal:

python3 --version

If you see the version listed after executing the command, it means that you already have Python3 installed and you can skip this step. If you see a "command not found" error, then proceed with the installation.

Head over to the following link and download the Python installer: https://www.python.org/downloads/ (opens in a new tab)

Once the installation is complete, run the above command again and it should output the version of Python.

Step 2: Upgrade Pip

Python comes with pip pre-packaged, but in case you are using an old installation, it's always a good idea to update pip to the latest version. Pip is a package manager for Python, similar to composer for PHP. You can upgrade it using a very simple command:

python3 -m pip install -U pip

If you have pip installed already, it will give you a warning e.g. "Requirement already satisfied: pip in [location-here]". If you don’t have the latest version of pip, it will install that. You can now verify if it's installed properly or not by executing the following command:

pip3 --version

It will tell you the version and location of the package.

Install Libraries for ChatGPT Training

Before diving into the actual training process, you’ll need to install some libraries. Open the Terminal application on your Mac and run the following commands one by one:

First command installs OpenAI library:

pip3 install openai

Next, install GPT index, which is also called LlamaIndex. It allows the LLM to connect to the external data that is our knowledge base.

For more details about how LlamaIndex works and how to use it, you can read our related articles on LlamaIndex.

pip3 install gpt_index

Once done, run the following command:

pip3 install PyPDF2

It's a Python-based PDF parsing library and is needed if you are going to feed PDF files to the model.

Finally, you have to run:

pip3 install gradio

This creates a simple UI to interact with AI chatgpt.

Get OpenAI Key for ChatGPT Training

Before diving into the script, let's get the API key from OpenAI. Head over to OpenAI API (opens in a new tab). If you haven't logged in already, it will ask you to login. You can then click on "Create new secret key" to generate a key for our script.

Remember that once the key is generated, you won't be able to see it again. You must copy and save the key in some secure location to be able to access it later.

Prepare Data for ChatGPT Training

Create a new directory named 'docs' anywhere you like and put PDF, TXT, or CSV files inside it. You can add multiple files if you like but remember that the more data you add, the more tokens will be used. Free accounts are given $18 worth of tokens to use.

Create a Script to Train ChatGPT

Now that everything is in place, our next step is to create a Python script to train the chatbot with custom data. It will use files inside the 'docs' directory, that we created above, and generate a JSON file.

You can use any text editor to create this file. MacOS comes with TextEdit, you can use that or if you are using Visual Studio Code then it's even better.

Create a new page and copy the following code:

from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import gradio as gr
import sys
import os
 
os.environ["OPENAI_API_KEY"] = ''
 
def construct_index(directory_path):
    max_input_size = 4096
    num_outputs = 512
    max_chunk_overlap = 20
    chunk_size_limit = 600
 
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
 
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.7, model_name="text-davinci-003", max_tokens=num_outputs))
 
    documents = SimpleDirectoryReader(directory_path).load_data()
 
    index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)
 
    index.save_to_disk('index.json')
 
    return index
 
def chatbot(input_text):
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    response = index.query(input_text, response_mode="compact")
    return response.response
 
iface = gr.Interface(fn=chatbot,
                     inputs=gr.inputs.Textbox(lines=7, label="Enter your text"),
                     outputs="text",
                     title="My AI Chatbot")
 
index = construct_index("docs")
iface.launch(share=True)

Once copied, you need to add your OpenAI key to the code before saving it. Notice the OPEN_API_KEY variable in the code? Copy your OpenAI key, that we extracted in Step 5, between the single quotes like this:

os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'

Then save the file with the extension app.py in the same location where you have your 'docs' directory.

Run the Script

Now we have everything in place, we can finally run the script and see the magic.

Navigate to where you have app.py and the 'docs' directory. Open Terminal and run the following command:

cd /path/to/your/directory

Next, execute the Python file:

python3 app.py

This will start training your custom chatbot. This might take some time based on how much data you have fed to it. Once done, it will output a link where you can test the responses using a simple UI.

As you can see, it outputs a local URL: http://127.0.0.1:7860

You can open this in any browser and start testing your custom trained chatbot. Notice that the port number above might be different for you.

You can ask questions on the left side and it will respond in the right column. Remember that questions will cost you tokens so the more questions you ask, the more tokens will be used from your OpenAI account. Training also uses tokens based on how much data you feed it.

To train on more or different data, you can close using CTRL + C and change files and then run the Python file again.

📚

Conclusion

Training ChatGPT with custom data allows you to create a chatbot tailored to your specific needs. Whether you're using Python libraries on MacOS, leveraging online platforms like CustomGPT.ai and ChatGPT School, or joining a community like the ChatGPT AI Automation Group, there are numerous ways to customize and enhance your chatbot's capabilities. By following the detailed steps and examples provided in this guide, you'll be well on your way to creating a powerful AI-powered chatbot.

Frequently Asked Questions

Can I train my own ChatGPT model?

Yes, you can train your own ChatGPT model. This guide provides detailed steps on how to do this using Python libraries on MacOS. You can also use online platforms like CustomGPT.ai and ChatGPT School to simplify the process.

Can you train ChatGPT with PDFs?

Yes, ChatGPT can be trained with PDFs. You can use Python libraries like PyPDF2 to parse the PDF files and feed the data to the model.

What data was used to train the ChatGPT?

ChatGPT was trained on a diverse range of internet text. However, OpenAI has not publicly disclosed the specifics of the individual datasets used. You can train your own ChatGPT model with custom data to tailor it to your specific needs.

Can you train a chatbot?

Yes, you can train a chatbot. This guide provides detailed steps on how to train a chatbot using ChatGPT and custom data. The process involves setting up your environment, preparing your data, and running a Python script to train the chatbot.

📚