125 lines
4.9 KiB
125 lines
4.9 KiB
# gpt-2
|
|
|
|
This is a C++ example running GPT-2 inference using the [ggml](https://github.com/ggerganov/ggml) library.
|
|
|
|
The program runs on the CPU - no video card is required.
|
|
|
|
The example supports the following models:
|
|
|
|
| Model | Description | Disk Size |
|
|
| --- | --- | --- |
|
|
| 117M | Small model | 240 MB |
|
|
| 345M | Medium model | 680 MB |
|
|
| 774M | Large model | 1.5 GB |
|
|
| 1558M | XL model | 3.0 GB |
|
|
|
|
Sample performance on MacBook M1 Pro:
|
|
|
|
| Model | Size | Time / Token |
|
|
| --- | --- | --- |
|
|
| GPT-2 | 117M | 5 ms |
|
|
| GPT-2 | 345M | 12 ms |
|
|
| GPT-2 | 774M | 23 ms |
|
|
| GPT-2 | 1558M | 42 ms |
|
|
|
|
Sample output:
|
|
|
|
```
|
|
$ ./bin/gpt-2 -h
|
|
usage: ./bin/gpt-2 [options]
|
|
|
|
options:
|
|
-h, --help show this help message and exit
|
|
-s SEED, --seed SEED RNG seed (default: -1)
|
|
-t N, --threads N number of threads to use during computation (default: 8)
|
|
-p PROMPT, --prompt PROMPT
|
|
prompt to start generation with (default: random)
|
|
-n N, --n_predict N number of tokens to predict (default: 200)
|
|
--top_k N top-k sampling (default: 40)
|
|
--top_p N top-p sampling (default: 0.9)
|
|
--temp N temperature (default: 1.0)
|
|
-b N, --batch_size N batch size for prompt processing (default: 8)
|
|
-m FNAME, --model FNAME
|
|
model path (default: models/gpt-2-117M/ggml-model.bin)
|
|
|
|
$ ./bin/gpt-2
|
|
gpt2_model_load: loading model from 'models/gpt-2-117M/ggml-model.bin'
|
|
gpt2_model_load: n_vocab = 50257
|
|
gpt2_model_load: n_ctx = 1024
|
|
gpt2_model_load: n_embd = 768
|
|
gpt2_model_load: n_head = 12
|
|
gpt2_model_load: n_layer = 12
|
|
gpt2_model_load: f16 = 1
|
|
gpt2_model_load: ggml ctx size = 311.12 MB
|
|
gpt2_model_load: memory size = 72.00 MB, n_mem = 12288
|
|
gpt2_model_load: model size = 239.08 MB
|
|
main: number of tokens in prompt = 1
|
|
|
|
So this is going to be the end of the line for us.
|
|
|
|
If the Dolphins continue to do their business, it's possible that the team could make a bid to bring in new defensive coordinator Scott Linehan.
|
|
|
|
Linehan's job is a little daunting, but he's a great coach and an excellent coach. I don't believe we're going to make the playoffs.
|
|
|
|
We're going to have to work hard to keep our heads down and get ready to go.<|endoftext|>
|
|
|
|
main: mem per token = 2048612 bytes
|
|
main: load time = 106.32 ms
|
|
main: sample time = 7.10 ms
|
|
main: predict time = 506.40 ms / 5.06 ms per token
|
|
main: total time = 629.84 ms
|
|
```
|
|
|
|
## Downloading and converting the original models
|
|
|
|
You can download the original model files using the [download-model.sh](download-model.sh) Bash script. The models are
|
|
in Tensorflow format, so in order to use them with ggml, you need to convert them to appropriate format. This is done
|
|
via the [convert-ckpt-to-ggml.py](convert-ckpt-to-ggml.py) python script.
|
|
|
|
Here is the entire process for the GPT-2 117M model (download from official site + conversion):
|
|
|
|
```
|
|
cd ggml/build
|
|
../examples/gpt-2/download-model.sh 117M
|
|
|
|
Downloading model 117M ...
|
|
models/gpt-2-117M/checkpoint 100%[=============================>] 77 --.-KB/s in 0s
|
|
models/gpt-2-117M/encoder.json 100%[=============================>] 1018K 1.20MB/s in 0.8s
|
|
models/gpt-2-117M/hparams.json 100%[=============================>] 90 --.-KB/s in 0s
|
|
models/gpt-2-117M/model.ckpt.data-00000-of-00001 100%[=============================>] 474.70M 1.21MB/s in 8m 39s
|
|
models/gpt-2-117M/model.ckpt.index 100%[=============================>] 5.09K --.-KB/s in 0s
|
|
models/gpt-2-117M/model.ckpt.meta 100%[=============================>] 460.11K 806KB/s in 0.6s
|
|
models/gpt-2-117M/vocab.bpe 100%[=============================>] 445.62K 799KB/s in 0.6s
|
|
Done! Model '117M' saved in 'models/gpt-2-117M/'
|
|
|
|
Run the convert-ckpt-to-ggml.py script to convert the model to ggml format.
|
|
|
|
python /Users/john/ggml/examples/gpt-2/convert-ckpt-to-ggml.py models/gpt-2-117M/
|
|
|
|
```
|
|
|
|
This conversion requires that you have python and Tensorflow installed on your computer. Still, if you want to avoid
|
|
this, you can download the already converted ggml models as described below.
|
|
|
|
## Downloading the ggml model directly
|
|
|
|
For convenience, I will be hosting the converted ggml model files in order to make it easier to run the examples. This
|
|
way, you can directly download a single binary file and start using it. No python or Tensorflow is required.
|
|
|
|
Here is how to get the 117M ggml model:
|
|
|
|
```
|
|
cd ggml/build
|
|
../examples/gpt-2/download-ggml-model.sh 117M
|
|
|
|
Downloading ggml model 117M ...
|
|
models/gpt-2-117M/ggml-model.bin 100%[===============================>] 239.58M 8.52MB/s in 28s
|
|
Done! Model '117M' saved in 'models/gpt-2-117M/ggml-model.bin'
|
|
You can now use it like this:
|
|
|
|
$ ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
|
|
|
|
```
|
|
|
|
At some point, I might decide to stop hosting these models. So in that case, simply revert to the manual process above.
|