Tensor library for machine learning

Go to file

katsu560 4c2f924553 cmake : update CMakeLists.txt to add correct flags (#26 ) * modify src/CMakeLists.txt from whisper.cpp * cmake : remove OpenBLAS stuff --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2 years ago
cmake	Initial release	3 years ago
examples	sync : latest whisper.cpp	2 years ago
include/ggml	sync : latest whisper.cpp (scratch buffers in ggml)	2 years ago
src	cmake : update CMakeLists.txt to add correct flags (#26 )	2 years ago
tests	tests : fix cblas_sgemm call	2 years ago
.gitignore	tests : experiments with n-bit quantized matrix multiplication	3 years ago
CMakeLists.txt	tests : wip quantized matrix multiplication method 2	3 years ago
LICENSE	Initial release	3 years ago
README.md	readme : update Roadmap	2 years ago

README.md

ggml

Tensor library for machine learning

Note that this project is under development and not ready for production use.
Some of the development is currently happening in the whisper.cpp repo

Features

Written in C
16-bit float support
Automatic differentiation (WIP in progress)
ADAM and L-BFGS optimizers
Optimized for Apple silicon via NEON intrinsics and Accelerate framework
On x86 architectures utilzes AVX intrinsics
No third-party dependencies
Zero memory allocations during runtime

Roadmap

Example of GPT-2 inference examples/gpt-2
Example of GPT-J inference examples/gpt-j
Example of Whisper inference examples/whisper
Support 4-bit integer quantization https://github.com/ggerganov/ggml/pull/27
Example of FLAN-T5 inference https://github.com/ggerganov/ggml/pull/12
Example of LLaMA inference
Example of RWKV inference

Whisper inference (example)

With ggml you can efficiently run Whisper inference on the CPU.

Memory requirements:

Model	Disk	Mem
tiny	75 MB	~280 MB
base	142 MB	~430 MB
small	466 MB	~1.0 GB
medium	1.5 GB	~2.6 GB
large	2.9 GB	~4.7 GB

GPT inference (example)

With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.

Here is how to run the example programs:

# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j

# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"

The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:

Model	Size	Time / Token
GPT-2	117M	5 ms
GPT-2	345M	12 ms
GPT-2	774M	23 ms
GPT-2	1558M	42 ms
---	---	---
GPT-J	6B	125 ms

For more information, checkout the corresponding programs in the examples folder.