LineBot+gpt-3.5_fine tuning

Process Directory Structure Diagram

LineBot+gpt-3.5_fine tuning/
├── Reference to OpenAI GPT-3.5 Official Documentation
├── Collecting Training Data
│   ├── Introducing Claude
│   └── Preparing the Training Dataset
├── Creating the Dataset
│   ├── Visiting OpenAI Fine-tuning Page
│   └── Preparing Data in JSON Format
├── Writing Fine Tuning Code
│   ├── Setting Up the Programming Environment
│   ├── Code Writing
│   └── Reference to YouTube Tutorial Videos
├── Developing LineBot Application
│   ├── Writing and Adjusting Code
│   └── Problems Encountered and Solutions
├── Deploying the Application
│   ├── Deploying LineBot on Render
│   └── Cronjob Setup
├── Comparison Before and After Fine-Tuning
└── Debugging Approach
    ├── Debugging for Specific Issues
    └── Solutions and Thought Process

1. Collecting Training Data

Generating Q&A Set

Introducing an artificial intelligence named Claude, developed by Anthropic. Unlike ChatGPT 3.5, Claude possesses a file upload feature, enabling it to extract information from PDFs or generate summaries. Currently, Claude is available only in Europe and America. To use Claude, users need to change their IP to the Americas via the Opera browser and complete registration using an SMS service. Once registered, users can query Claude just like they would with ChatGPT. (Available to users in Taiwan from 2023/10/18, no need to switch VPN)

Next, prepare a training dataset to create a chatbot. Here, I randomly selected a topic with the aim of helping people resolve their uncertainties about which medical department to consult.

First, upload a PDF document, then ask Claude to organize 30 common questions from it.

Specific Format Conversion

Visit the OpenAI Fine-tuning page to determine the specific format required for text content conversion. The text format is organized as follows:

The example demonstrates that data should be arranged in JSON (JavaScript Object Notation) format, which is convenient for exchanging, storing, and reading simple data. If you are not familiar with JSON, Claude can handle the formatting issues. Just copy the example format and modify it to the desired ChatGPT role; Claude will automatically generate the corresponding content.

With this, the preparation of the training dataset is complete. Next, upload this file to your personal GitHub, concluding the first step.

2. Writing Fine Tuning Code

First, open an editor, this time opting for vscode. Then create a new file named fine_tune.py.