Commit Graph

54 Commits (llama)

Author SHA1 Message Date
Georgi Gerganov 7f32376b70
llama : initial working FP16 + 4-bit Q4_0
1 year ago
Georgi Gerganov c2e9635c79
llama : initial model conversion to ggml format
1 year ago
Georgi Gerganov 3adf02e311
utils : print quantization histograms
1 year ago
Georgi Gerganov 3f08ce7004
whisper : add Q4_1 model sizes
1 year ago
Georgi Gerganov 2fcbd28143
gpt : support quantisation of f16 models files
1 year ago
Georgi Gerganov 10356cdcdd
gpt : seems not worth to use FP16 for KV cache
1 year ago
Georgi Gerganov 4c1032f2d4
whisper : mem usage based on model format type
1 year ago
Georgi Gerganov f2d174f530
whisper : add support for quantized models
1 year ago
Georgi Gerganov b46f35b1f9
whisper : add whisper-qunatize tool
1 year ago
Georgi Gerganov eaa4006047
gpt : fix memory usage computation
1 year ago
Georgi Gerganov fde29bd005
ggml : add ggml_compute_forward_rope_f16()
1 year ago
Georgi Gerganov 39265de79f
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
1 year ago
Georgi Gerganov 5bd952ac3f
gpt-2 : minor
1 year ago
Georgi Gerganov 86b1e356b0
gpt : avoid ggml_transpose on model tensors (new models!)
1 year ago
Georgi Gerganov 11295af7a6
gpt-j : support for 4-bit quantized model inference
1 year ago
Georgi Gerganov ea97a5f469
ggml : vectorized mad q4_0 (ARM)
1 year ago
Georgi Gerganov cc94fdafe7
ggml : 4-bit quantization works (only scalar for now)
1 year ago
Georgi Gerganov b48b09c37f
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
1 year ago
Georgi Gerganov a366dd31cc
ggml : q4_1 quantization support (seems to work for bigger models)
1 year ago
Georgi Gerganov a37776ddc0
ggml : q4_0 quantization support
1 year ago
Georgi Gerganov 751aa84f1a
gpt-2 : loading Q4_0 quantized model
1 year ago
Georgi Gerganov ca2714384b
gpt-2 : model conversion for Q4_0 quantization
1 year ago
Georgi Gerganov 8f8a5aca99
sync : latest whisper.cpp
1 year ago
Georgi Gerganov a6acb3318a
sync : latest whisper.cpp (scratch buffers in ggml)
1 year ago
Georgi Gerganov 47b297224e
Update README.md
1 year ago
Georgi Gerganov fb64edddb7
gpt : fix sampling to use the temperature (close #16)
1 year ago
Georgi Gerganov c40a5b51a0
ggml : sync latest whisper.cpp
1 year ago
Georgi Gerganov a0f2f68cdb
gpt-2 : fix broken prompt due to recent experiments
1 year ago
Georgi Gerganov dee3684fec
ggml : sync latest whisper.cpp
1 year ago
Georgi Gerganov 45fc4fed0b
sync : latest changes from whisper.cpp
1 year ago
Georgi Gerganov d677c7f61d tests : minor fixes for x86
1 year ago
Georgi Gerganov bd9f710a45
sync : latest changes from whisper.cpp
1 year ago
Georgi Gerganov 1dcbe86a0c
gpt-2 : experimenting with attention mask
1 year ago
Georgi Gerganov 99f1afb613
gpt-2 : fix off-by-one error in batching logic
1 year ago
Georgi Gerganov 64efeceabd
examples : redirect download scripts to HF
1 year ago
Georgi Gerganov ed09c7190e
gpt : add support for gpt-jt + fix unicode support
2 years ago
Georgi Gerganov f56828ed78
ggml : sync with latest code from whisper.cpp
2 years ago
Georgi Gerganov 90ee5c6358
sync : latest changes from whisper.cpp
2 years ago
Georgi Gerganov 6feeca262f
sync : latest changes from whisper.cpp
2 years ago
Georgi Gerganov 624e4f5313
whisper : fix timestamp sampling
2 years ago
Georgi Gerganov 7094be1f37
sync : whisper.cpp
2 years ago
Georgi Gerganov 270829aa9f
sync : whisper.cpp
2 years ago
Georgi Gerganov 7b70c5a561
Minor fixes
2 years ago
Georgi Gerganov d8f64bce3d
Improve mul_mat performance for big matrices using Accelerate framework
2 years ago
Georgi Gerganov 67ac34fcfa
sync : whisper.cpp
2 years ago
Georgi Gerganov e2f39f4b52
whisper : sync with whisper.cpp
2 years ago
Georgi Gerganov 8e3c634b27
whisper : various improvements
2 years ago
Georgi Gerganov 8ca553add4
whisper : add C-style API
2 years ago
Georgi Gerganov dd1f4dfbab
whisper : various fixes
2 years ago
Georgi Gerganov 0116c03fb7
whisper : various updates and improvements
2 years ago