Georgi Gerganov
|
3adf02e311
|
utils : print quantization histograms
|
2 years ago |
Georgi Gerganov
|
2fcbd28143
|
gpt : support quantisation of f16 models files
|
2 years ago |
Georgi Gerganov
|
10356cdcdd
|
gpt : seems not worth to use FP16 for KV cache
|
2 years ago |
Georgi Gerganov
|
eaa4006047
|
gpt : fix memory usage computation
|
2 years ago |
Georgi Gerganov
|
fde29bd005
|
ggml : add ggml_compute_forward_rope_f16()
|
2 years ago |
Georgi Gerganov
|
39265de79f
|
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
|
2 years ago |
Georgi Gerganov
|
86b1e356b0
|
gpt : avoid ggml_transpose on model tensors (new models!)
|
2 years ago |
Georgi Gerganov
|
11295af7a6
|
gpt-j : support for 4-bit quantized model inference
|
2 years ago |
Georgi Gerganov
|
47b297224e
|
Update README.md
|
2 years ago |
Georgi Gerganov
|
fb64edddb7
|
gpt : fix sampling to use the temperature (close #16)
|
2 years ago |
Georgi Gerganov
|
64efeceabd
|
examples : redirect download scripts to HF
|
2 years ago |
Georgi Gerganov
|
ed09c7190e
|
gpt : add support for gpt-jt + fix unicode support
|
2 years ago |
Georgi Gerganov
|
787efb4d2e
|
Adding Whisper inference example
|
2 years ago |
Georgi Gerganov
|
f21b84cd21
|
Update README.md + minor stuff
- Changed default threads to 4
- Added GGML_PERF for enabling runtime performance timings
|
2 years ago |
Georgi Gerganov
|
0f4e99b1cc
|
Update README.md
|
2 years ago |
Georgi Gerganov
|
fb558f78d9
|
Initial release
|
2 years ago |