You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ggml/README.md

65 lines
1.7 KiB

# ggml
Tensor library for machine learning
## Features
- Written in C
- 16-bit float support
- Automatic differentiation (WIP in progress)
- ADAM and L-BFGS optimizers
- Optimized for Arm64 architectures (M1) via NEON intrinsics
- On x86 architectures utilzes AVX intrinsics
- No third-party dependencies
- Zero memory allocations during runtime
## Whisper inference (example)
With ggml you can efficiently run [Whisper](examples/whisper) inference on the CPU.
Memory requirements:
| Model | Mem |
| --- | --- |
| tiny.en | ~460 MB |
| base.en | ~620 MB |
| small.en | ~1.3 GB |
| medium.en | ~2.8 GB |
| large | ~4.9 GB |
## GPT inference (example)
With ggml you can efficiently run [GPT-2](examples/gpt-2) and [GPT-J](examples/gpt-j) inference on the CPU.
Here is how to run the example programs:
```bash
# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j
# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
```
The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:
| Model | Size | Time / Token |
| --- | --- | --- |
| GPT-2 | 117M | 5 ms |
| GPT-2 | 345M | 12 ms |
| GPT-2 | 774M | 23 ms |
| GPT-2 | 1558M | 42 ms |
| --- | --- | --- |
| GPT-J | 6B | 125 ms |
For more information, checkout the corresponding programs in the [examples](examples) folder.