Georgi Gerganov
|
7f32376b70
|
llama : initial working FP16 + 4-bit Q4_0
|
2 years ago |
Georgi Gerganov
|
c2e9635c79
|
llama : initial model conversion to ggml format
|
2 years ago |
Georgi Gerganov
|
3adf02e311
|
utils : print quantization histograms
|
2 years ago |
Georgi Gerganov
|
3f08ce7004
|
whisper : add Q4_1 model sizes
|
2 years ago |
Georgi Gerganov
|
2fcbd28143
|
gpt : support quantisation of f16 models files
|
2 years ago |
Georgi Gerganov
|
10356cdcdd
|
gpt : seems not worth to use FP16 for KV cache
|
2 years ago |
Georgi Gerganov
|
4c1032f2d4
|
whisper : mem usage based on model format type
|
2 years ago |
Georgi Gerganov
|
f2d174f530
|
whisper : add support for quantized models
|
2 years ago |
Georgi Gerganov
|
b46f35b1f9
|
whisper : add whisper-qunatize tool
|
2 years ago |
Georgi Gerganov
|
eaa4006047
|
gpt : fix memory usage computation
|
2 years ago |
Georgi Gerganov
|
fde29bd005
|
ggml : add ggml_compute_forward_rope_f16()
|
2 years ago |
Georgi Gerganov
|
39265de79f
|
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
|
2 years ago |
Georgi Gerganov
|
5bd952ac3f
|
gpt-2 : minor
|
2 years ago |
Georgi Gerganov
|
86b1e356b0
|
gpt : avoid ggml_transpose on model tensors (new models!)
|
2 years ago |
Georgi Gerganov
|
11295af7a6
|
gpt-j : support for 4-bit quantized model inference
|
2 years ago |
Georgi Gerganov
|
ea97a5f469
|
ggml : vectorized mad q4_0 (ARM)
|
2 years ago |
Georgi Gerganov
|
cc94fdafe7
|
ggml : 4-bit quantization works (only scalar for now)
|
2 years ago |
Georgi Gerganov
|
b48b09c37f
|
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
|
2 years ago |
Georgi Gerganov
|
a366dd31cc
|
ggml : q4_1 quantization support (seems to work for bigger models)
|
2 years ago |
Georgi Gerganov
|
a37776ddc0
|
ggml : q4_0 quantization support
|
2 years ago |
Georgi Gerganov
|
751aa84f1a
|
gpt-2 : loading Q4_0 quantized model
|
2 years ago |
Georgi Gerganov
|
ca2714384b
|
gpt-2 : model conversion for Q4_0 quantization
|
2 years ago |
Georgi Gerganov
|
8f8a5aca99
|
sync : latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
a6acb3318a
|
sync : latest whisper.cpp (scratch buffers in ggml)
|
2 years ago |
Georgi Gerganov
|
47b297224e
|
Update README.md
|
2 years ago |
Georgi Gerganov
|
fb64edddb7
|
gpt : fix sampling to use the temperature (close #16)
|
2 years ago |
Georgi Gerganov
|
c40a5b51a0
|
ggml : sync latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
a0f2f68cdb
|
gpt-2 : fix broken prompt due to recent experiments
No idea why I commited that!?
|
2 years ago |
Georgi Gerganov
|
dee3684fec
|
ggml : sync latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
45fc4fed0b
|
sync : latest changes from whisper.cpp
|
2 years ago |
Georgi Gerganov
|
d677c7f61d
|
tests : minor fixes for x86
|
2 years ago |
Georgi Gerganov
|
bd9f710a45
|
sync : latest changes from whisper.cpp
|
2 years ago |
Georgi Gerganov
|
1dcbe86a0c
|
gpt-2 : experimenting with attention mask
|
2 years ago |
Georgi Gerganov
|
99f1afb613
|
gpt-2 : fix off-by-one error in batching logic
|
2 years ago |
Georgi Gerganov
|
64efeceabd
|
examples : redirect download scripts to HF
|
2 years ago |
Georgi Gerganov
|
ed09c7190e
|
gpt : add support for gpt-jt + fix unicode support
|
2 years ago |
Georgi Gerganov
|
f56828ed78
|
ggml : sync with latest code from whisper.cpp
|
2 years ago |
Georgi Gerganov
|
90ee5c6358
|
sync : latest changes from whisper.cpp
- Documentation
- whisper : token-level timestamps
- ggml : Windows build fixes
- etc.
|
2 years ago |
Georgi Gerganov
|
6feeca262f
|
sync : latest changes from whisper.cpp
|
2 years ago |
Georgi Gerganov
|
624e4f5313
|
whisper : fix timestamp sampling
|
2 years ago |
Georgi Gerganov
|
7094be1f37
|
sync : whisper.cpp
- Add MSVC header
- FP16 GELU
- C interface fixes (no unions)
- Minor CMake updates
|
2 years ago |
Georgi Gerganov
|
270829aa9f
|
sync : whisper.cpp
|
2 years ago |
Georgi Gerganov
|
7b70c5a561
|
Minor fixes
|
2 years ago |
Georgi Gerganov
|
d8f64bce3d
|
Improve mul_mat performance for big matrices using Accelerate framework
Also:
- Speedup GELU operator via F16 cast
- Multi-thread NORM operator
- Disable FLASH_FF in whisper example
|
2 years ago |
Georgi Gerganov
|
67ac34fcfa
|
sync : whisper.cpp
|
2 years ago |
Georgi Gerganov
|
e2f39f4b52
|
whisper : sync with whisper.cpp
|
2 years ago |
Georgi Gerganov
|
8e3c634b27
|
whisper : various improvements
|
2 years ago |
Georgi Gerganov
|
8ca553add4
|
whisper : add C-style API
|
2 years ago |
Georgi Gerganov
|
dd1f4dfbab
|
whisper : various fixes
|
2 years ago |
Georgi Gerganov
|
0116c03fb7
|
whisper : various updates and improvements
|
2 years ago |