Georgi Gerganov
|
cc94fdafe7
|
ggml : 4-bit quantization works (only scalar for now)
|
2 years ago |
Georgi Gerganov
|
b48b09c37f
|
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
|
2 years ago |
Georgi Gerganov
|
a366dd31cc
|
ggml : q4_1 quantization support (seems to work for bigger models)
|
2 years ago |
Georgi Gerganov
|
a37776ddc0
|
ggml : q4_0 quantization support
|
2 years ago |
Georgi Gerganov
|
751aa84f1a
|
gpt-2 : loading Q4_0 quantized model
|
2 years ago |
Georgi Gerganov
|
ca2714384b
|
gpt-2 : model conversion for Q4_0 quantization
|
2 years ago |
Georgi Gerganov
|
fb64edddb7
|
gpt : fix sampling to use the temperature (close #16)
|
2 years ago |
Georgi Gerganov
|
a0f2f68cdb
|
gpt-2 : fix broken prompt due to recent experiments
No idea why I commited that!?
|
2 years ago |
Georgi Gerganov
|
1dcbe86a0c
|
gpt-2 : experimenting with attention mask
|
2 years ago |
Georgi Gerganov
|
99f1afb613
|
gpt-2 : fix off-by-one error in batching logic
|
2 years ago |
Georgi Gerganov
|
64efeceabd
|
examples : redirect download scripts to HF
|
2 years ago |
Georgi Gerganov
|
ed09c7190e
|
gpt : add support for gpt-jt + fix unicode support
|
2 years ago |
Georgi Gerganov
|
787efb4d2e
|
Adding Whisper inference example
|
2 years ago |
Georgi Gerganov
|
f21b84cd21
|
Update README.md + minor stuff
- Changed default threads to 4
- Added GGML_PERF for enabling runtime performance timings
|
2 years ago |
Georgi Gerganov
|
fb558f78d9
|
Initial release
|
2 years ago |