k050506koch commited on
Commit
2789c02
1 Parent(s): 59e5c8c

Enhance README

Browse files
Files changed (1) hide show
  1. README.md +12 -106
README.md CHANGED
@@ -1,115 +1,21 @@
 
 
 
 
 
 
 
 
1
  # GPT3
2
 
3
  Welcome to the GPT3 repository! This project is an attempt to recreate the architecture and approach from the original OpenAI GPT-3 paper. The repository includes scripts for training, fine-tuning, and inference of a GPT-3-like model using PyTorch and the Hugging Face Transformers library.
 
4
 
5
- ### Note: I'm currently working on training these models, now 17M in on it's way. When finished, all weights will be published on huggingface
6
-
7
- ## Repository Structure
8
- ### Files
9
-
10
- - **[gpt3_stable_17m.py](gpt3_stable_17m.py)**: Script for training the GPT-3 model which has approximateley 17,867,008 parameters.
11
- - **[gpt3.py](gpt3.py)**: Script for training the GPT-3 model with cross-validation.
12
- - **[inference.py](inference.py)**: Script for running inference with the trained GPT-3 model.
13
- - **[gpt3-SFT.py](gpt3-SFT.py)**: Script for testing and fine-tuning the GPT-3 model with Supervised Fine-Tuning (SFT).
14
-
15
- ## Key Features
16
-
17
- - **Custom Model Architecture**: Implements custom GPT-3 model components such as [`CustomGPT2Attention`](gpt3-17m.py#L136), [`CustomGPT2MLP`](gpt3-17m.py#L143), [`CustomGPT2Block`](gpt3-17m.py#L150), and [`CustomGPT2LMHeadModel`](gpt3-17m.py#L235).
18
- - **Training Loop**: Includes gradient accumulation, gradient clipping, and perplexity computation.
19
- - **Inference**: Supports text generation stream with top-k and top-p filtering.
20
- - **Logging and Checkpointing**: Uses Weights & Biases for logging and saves model checkpoints periodically.
21
-
22
- ## Getting Started
23
-
24
- ### Prerequisites
25
-
26
- - Python 3.8+ (I used 3.12)
27
- - PyTorch (Stable or Nightly)
28
- - Transformers (Hugging Face)
29
- - Datasets
30
- - Weights & Biases (wandb)
31
-
32
- ### Installation
33
-
34
- 1. Clone the repository:
35
- ```sh
36
- git clone https://github.com/krll-corp/GPT3.git
37
- cd GPT3
38
- ```
39
-
40
- 2. Install the required packages:
41
- ```sh
42
- pip install -U transformers datasetes evaluate torch wandb
43
- ```
44
-
45
- ### Training
46
-
47
- To train the model, run the following command:
48
-
49
- ```sh
50
- python gpt3-17m.py
51
-
52
-
53
- # on MacOS or Linux it's
54
- python3 gpt3-17m.py
55
-
56
- ```
57
-
58
- ### Inference
59
-
60
- To generate text using the trained model, run:
61
-
62
- ```sh
63
- python inference.py
64
-
65
-
66
- # on MacOS or Linux
67
- python3 gpt3-inference_v2.py
68
- ```
69
-
70
- ### Fine-Tuning
71
-
72
- If you have trained a foundation model
73
- ```sh
74
- python gpt3-SFT.py
75
-
76
- # on MacOS or Linux
77
- python3 gpt3-SFT.py
78
- ```
79
-
80
- ## Usage
81
-
82
- ### Training Script
83
-
84
- The training script initializes the model, optimizer, and learning rate scheduler. It then enters a training loop where it performs forward and backward passes, applies gradient clipping, and updates the model parameters. The aim of the script is to train a foundation model which can then be fine-tuned for chat / question answering / etc.
85
-
86
- ### Inference Script
87
-
88
- The inference script loads a pre-trained model and tokenizer, moves the model to the appropriate device, and generates text based on user input using the [`generate_text_stream`](inference.py#L246) function.
89
-
90
- ## Custom Components
91
-
92
- Everything was taken from official GPT-2 implementation
93
-
94
- ### CustomGPT2Attention
95
-
96
- A custom implementation of the GPT-3 attention mechanism with biases included.
97
-
98
- ### CustomGPT2MLP
99
-
100
- A custom implementation of the GPT-3 MLP with biases and standard GeLU activation.
101
-
102
- ### CustomGPT2Block
103
-
104
- A custom implementation of the GPT-3 block with optional pre-layer normalization.
105
-
106
- ### CustomGPT2LMHeadModel
107
-
108
- A custom implementation of the GPT-3 language model head with additional keyword arguments support.
109
 
110
  ## Contributing
111
 
112
- Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
113
 
114
  ## License
115
 
@@ -121,4 +27,4 @@ Thanks OpenAI, HuggingFace and Pytorch for making this project possible!
121
 
122
  - [OpenAI GPT-3 Paper](https://arxiv.org/abs/2005.14165)
123
  - [Hugging Face Transformers](https://github.com/huggingface/transformers)
124
- - [PyTorch](https://pytorch.org/)
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - HuggingFaceFW/fineweb
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ ---
9
  # GPT3
10
 
11
  Welcome to the GPT3 repository! This project is an attempt to recreate the architecture and approach from the original OpenAI GPT-3 paper. The repository includes scripts for training, fine-tuning, and inference of a GPT-3-like model using PyTorch and the Hugging Face Transformers library.
12
+ Here are located weights of dev checkpoints of my models. You can always download a folder, paste it's path inside inference.py and chat with them.
13
 
14
+ # **You can find all code on [GitHub](https://github.com/krll-corp/GPT3)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Contributing
17
 
18
+ Contributions are welcome! I'm just a student who is interested in AI so my code may be incorrect or have logical issues. Please open an issue or submit a pull request for any improvements or bug fixes, I will be happy.
19
 
20
  ## License
21
 
 
27
 
28
  - [OpenAI GPT-3 Paper](https://arxiv.org/abs/2005.14165)
29
  - [Hugging Face Transformers](https://github.com/huggingface/transformers)
30
+ - [PyTorch](https://pytorch.org/)