1.7 KiB

Raw Blame History

ggml

Tensor library for machine learning

Features

Written in C
16-bit float support
Automatic differentiation (WIP in progress)
ADAM and L-BFGS optimizers
Optimized for Arm64 architectures (M1) via NEON intrinsics
On x86 architectures utilzes AVX intrinsics
No third-party dependencies
Zero memory allocations during runtime

Whisper inference (example)

With ggml you can efficiently run Whisper inference on the CPU.

Memory requirements:

Model	Mem
tiny.en	~460 MB
base.en	~620 MB
small.en	~1.3 GB
medium.en	~2.8 GB
large	~4.9 GB

GPT inference (example)

With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.

Here is how to run the example programs:

# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j

# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"

The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:

Model	Size	Time / Token
GPT-2	117M	5 ms
GPT-2	345M	12 ms
GPT-2	774M	23 ms
GPT-2	1558M	42 ms
---	---	---
GPT-J	6B	125 ms

For more information, checkout the corresponding programs in the examples folder.

1.7 KiB Raw Blame History

ggml

Features

Whisper inference (example)

GPT inference (example)

1.7 KiB

Raw Blame History