Tensor library for machine learning

Go to file

Georgi Gerganov ea0ef2a41e Performance tests - trying to optimize mul_mat		3 years ago
cmake	Initial release	3 years ago
examples	sync : whisper.cpp	3 years ago
include/ggml	sync : whisper.cpp	3 years ago
src	sync : whisper.cpp	3 years ago
tests	Performance tests - trying to optimize mul_mat	3 years ago
.gitignore	Initial release	3 years ago
CMakeLists.txt	Performance tests - trying to optimize mul_mat	3 years ago
LICENSE	Initial release	3 years ago
README.md	Adding Whisper inference example	3 years ago

README.md

ggml

Tensor library for machine learning

Features

Written in C
16-bit float support
Automatic differentiation (WIP in progress)
ADAM and L-BFGS optimizers
Optimized for Arm64 architectures (M1) via NEON intrinsics
On x86 architectures utilzes AVX intrinsics
No third-party dependencies
Zero memory allocations during runtime

Whisper inference (example)

With ggml you can efficiently run Whisper inference on the CPU.

Memory requirements:

Model	Mem
tiny.en	~460 MB
base.en	~620 MB
small.en	~1.3 GB
medium.en	~2.8 GB
large	~4.9 GB

GPT inference (example)

With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.

Here is how to run the example programs:

# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j

# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"

The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:

Model	Size	Time / Token
GPT-2	117M	5 ms
GPT-2	345M	12 ms
GPT-2	774M	23 ms
GPT-2	1558M	42 ms
---	---	---
GPT-J	6B	125 ms

For more information, checkout the corresponding programs in the examples folder.