You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Georgi Gerganov
ba3e8a3d7f
|
2 years ago | |
---|---|---|
cmake | 2 years ago | |
examples | 2 years ago | |
include/ggml | 2 years ago | |
src | 2 years ago | |
tests | 2 years ago | |
.gitignore | 2 years ago | |
CMakeLists.txt | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago |
README.md
ggml
Tensor library for machine learning
Note that this project is under development and not ready for production use.
Some of the development is currently happening in the whisper.cpp repo
Features
- Written in C
- 16-bit float support
- Automatic differentiation (WIP in progress)
- ADAM and L-BFGS optimizers
- Optimized for Apple silicon via NEON intrinsics and Accelerate framework
- On x86 architectures utilzes AVX intrinsics
- No third-party dependencies
- Zero memory allocations during runtime
Roadmap
- Example of GPT-2 inference examples/gpt-2
- Example of GPT-J inference examples/gpt-j
- Example of Whisper inference examples/whisper
- Support 4-bit integer quantization https://github.com/ggerganov/ggml/pull/27
- Example of FLAN-T5 inference https://github.com/ggerganov/ggml/pull/12
- Example of LLaMA inference
- Example of RWKV inference
Whisper inference (example)
With ggml you can efficiently run Whisper inference on the CPU.
Memory requirements:
Model | Disk | Mem |
---|---|---|
tiny | 75 MB | ~280 MB |
base | 142 MB | ~430 MB |
small | 466 MB | ~1.0 GB |
medium | 1.5 GB | ~2.6 GB |
large | 2.9 GB | ~4.7 GB |
GPT inference (example)
With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.
Here is how to run the example programs:
# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j
# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:
Model | Size | Time / Token |
---|---|---|
GPT-2 | 117M | 5 ms |
GPT-2 | 345M | 12 ms |
GPT-2 | 774M | 23 ms |
GPT-2 | 1558M | 42 ms |
--- | --- | --- |
GPT-J | 6B | 125 ms |
For more information, checkout the corresponding programs in the examples folder.