|
|
|
# ggml
|
|
|
|
|
|
|
|
Tensor library for machine learning
|
|
|
|
|
|
|
|
## Features
|
|
|
|
|
|
|
|
- Written in C
|
|
|
|
- 16-bit float support
|
|
|
|
- Automatic differentiation (WIP in progress)
|
|
|
|
- ADAM and L-BFGS optimizers
|
|
|
|
- Optimized for Apple silicon via NEON intrinsics and Accelerate framework
|
|
|
|
- On x86 architectures utilzes AVX intrinsics
|
|
|
|
- No third-party dependencies
|
|
|
|
- Zero memory allocations during runtime
|
|
|
|
|
|
|
|
*Note that this project is under development and not ready for production use*
|
|
|
|
|
|
|
|
## Whisper inference (example)
|
|
|
|
|
|
|
|
With ggml you can efficiently run [Whisper](examples/whisper) inference on the CPU.
|
|
|
|
|
|
|
|
Memory requirements:
|
|
|
|
|
|
|
|
| Model | Disk | Mem |
|
|
|
|
| --- | --- | --- |
|
|
|
|
| tiny | 75 MB | ~280 MB |
|
|
|
|
| base | 142 MB | ~430 MB |
|
|
|
|
| small | 466 MB | ~1.0 GB |
|
|
|
|
| medium | 1.5 GB | ~2.6 GB |
|
|
|
|
| large | 2.9 GB | ~4.7 GB |
|
|
|
|
|
|
|
|
## GPT inference (example)
|
|
|
|
|
|
|
|
With ggml you can efficiently run [GPT-2](examples/gpt-2) and [GPT-J](examples/gpt-j) inference on the CPU.
|
|
|
|
|
|
|
|
Here is how to run the example programs:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
# Build ggml + examples
|
|
|
|
git clone https://github.com/ggerganov/ggml
|
|
|
|
cd ggml
|
|
|
|
mkdir build && cd build
|
|
|
|
cmake ..
|
|
|
|
make -j4 gpt-2 gpt-j
|
|
|
|
|
|
|
|
# Run the GPT-2 small 117M model
|
|
|
|
../examples/gpt-2/download-ggml-model.sh 117M
|
|
|
|
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
|
|
|
|
|
|
|
|
# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
|
|
|
|
../examples/gpt-j/download-ggml-model.sh 6B
|
|
|
|
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
|
|
|
|
```
|
|
|
|
|
|
|
|
The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:
|
|
|
|
|
|
|
|
| Model | Size | Time / Token |
|
|
|
|
| --- | --- | --- |
|
|
|
|
| GPT-2 | 117M | 5 ms |
|
|
|
|
| GPT-2 | 345M | 12 ms |
|
|
|
|
| GPT-2 | 774M | 23 ms |
|
|
|
|
| GPT-2 | 1558M | 42 ms |
|
|
|
|
| --- | --- | --- |
|
|
|
|
| GPT-J | 6B | 125 ms |
|
|
|
|
|
|
|
|
For more information, checkout the corresponding programs in the [examples](examples) folder.
|