Tensor library for machine learning
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Go to file
Georgi Gerganov deb0c486c7
tests : wip quantized matrix multiplication method 2
2 years ago
cmake Initial release 2 years ago
examples tests : minor fixes for x86 2 years ago
include/ggml sync : latest changes from whisper.cpp 2 years ago
src sync : latest changes from whisper.cpp 2 years ago
tests tests : wip quantized matrix multiplication method 2 2 years ago
.gitignore tests : experiments with n-bit quantized matrix multiplication 2 years ago
CMakeLists.txt tests : wip quantized matrix multiplication method 2 2 years ago
LICENSE Initial release 2 years ago
README.md Update README.md 2 years ago

README.md

ggml

Tensor library for machine learning

Features

  • Written in C
  • 16-bit float support
  • Automatic differentiation (WIP in progress)
  • ADAM and L-BFGS optimizers
  • Optimized for Apple silicon via NEON intrinsics and Accelerate framework
  • On x86 architectures utilzes AVX intrinsics
  • No third-party dependencies
  • Zero memory allocations during runtime

Note that this project is under development and not ready for production use. Most of the development is currently happening in the whisper.cpp repo, so if you are interested in this project, make sure to follow what is happening there.

Whisper inference (example)

With ggml you can efficiently run Whisper inference on the CPU.

Memory requirements:

Model Disk Mem
tiny 75 MB ~280 MB
base 142 MB ~430 MB
small 466 MB ~1.0 GB
medium 1.5 GB ~2.6 GB
large 2.9 GB ~4.7 GB

GPT inference (example)

With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.

Here is how to run the example programs:

# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j

# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"

The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:

Model Size Time / Token
GPT-2 117M 5 ms
GPT-2 345M 12 ms
GPT-2 774M 23 ms
GPT-2 1558M 42 ms
--- --- ---
GPT-J 6B 125 ms

For more information, checkout the corresponding programs in the examples folder.