Georgi Gerganov
|
10356cdcdd
|
gpt : seems not worth to use FP16 for KV cache
|
2 years ago |
Georgi Gerganov
|
eaa4006047
|
gpt : fix memory usage computation
|
2 years ago |
Georgi Gerganov
|
fde29bd005
|
ggml : add ggml_compute_forward_rope_f16()
|
2 years ago |
Georgi Gerganov
|
86b1e356b0
|
gpt : avoid ggml_transpose on model tensors (new models!)
|
2 years ago |
Georgi Gerganov
|
11295af7a6
|
gpt-j : support for 4-bit quantized model inference
|
2 years ago |
Georgi Gerganov
|
fb64edddb7
|
gpt : fix sampling to use the temperature (close #16)
|
2 years ago |
Georgi Gerganov
|
787efb4d2e
|
Adding Whisper inference example
|
2 years ago |
Georgi Gerganov
|
fb558f78d9
|
Initial release
|
2 years ago |