You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Georgi Gerganov
0f4e99b1cc
|
2 years ago | |
---|---|---|
cmake | 2 years ago | |
examples | 2 years ago | |
include/ggml | 2 years ago | |
src | 2 years ago | |
tests | 2 years ago | |
.gitignore | 2 years ago | |
CMakeLists.txt | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago |
README.md
ggml
Tensor library in C for machine learning
Features
- Automatic differentiation (WIP)
- 16-bit float support
- ADAM and L-BFGS optimizers
- Optimized for Arm64 architectures (i.e. MacBook M1) via NEON intrinsics
- On x86 architectures utilzes AVX intrinsics
- No third-party dependencies
- Zero memory allocations during runtime
Local GPT inference
Using ggml you can run GPT-2 and GPT-J inference locally on your computer without any additional software or hardware. You don't even need to install python or any other third-party library.
The example programs are implemented in C++. They run entirely on the CPU.
Here is how to use them:
# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j
# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
This is the inference speed for the different models on my MacBook M1 Pro:
Model | Size | Time / Token |
---|---|---|
GPT-2 | 117M | 5 ms |
GPT-2 | 345M | 12 ms |
GPT-2 | 774M | 23 ms |
GPT-2 | 1558M | 42 ms |
--- | --- | --- |
GPT-J | 6B | 125 ms |
For more information, checkout the corresponding programs in the examples folder.