Commit Graph

81 Commits (b7621b4fdad335126c91093f7585e972b7e468d1)
 

Author SHA1 Message Date
Georgi Gerganov b7621b4fda
ggml : fixes for rpi4 2 years ago
Georgi Gerganov 2fcbd28143
gpt : support quantisation of f16 models files 2 years ago
Georgi Gerganov 10356cdcdd
gpt : seems not worth to use FP16 for KV cache 2 years ago
Georgi Gerganov 4c1032f2d4
whisper : mem usage based on model format type 2 years ago
Georgi Gerganov f2d174f530
whisper : add support for quantized models 2 years ago
Georgi Gerganov b46f35b1f9
whisper : add whisper-qunatize tool 2 years ago
Georgi Gerganov 98f6a4bf94
ggml : fix ggml_is_contiguous() to take into account blck size 2 years ago
Georgi Gerganov eaa4006047
gpt : fix memory usage computation 2 years ago
Georgi Gerganov fde29bd005
ggml : add ggml_compute_forward_rope_f16() 2 years ago
Georgi Gerganov 39265de79f
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B) 2 years ago
Georgi Gerganov 5bd952ac3f
gpt-2 : minor 2 years ago
Georgi Gerganov 86b1e356b0
gpt : avoid ggml_transpose on model tensors (new models!) 2 years ago
Georgi Gerganov e052167772
ggml : GGML_ASSERT() instead of assert() where appropriate 2 years ago
Georgi Gerganov 11295af7a6
gpt-j : support for 4-bit quantized model inference 2 years ago
Georgi Gerganov 7d5889475a
ggml : minor indentations 2 years ago
Georgi Gerganov e89cb32625
ggml : simplify mad q4_0 (ARM) 2 years ago
Georgi Gerganov 6309a60bac
ggml : vectorized quantize_row_q4_0 (ARM) 2 years ago
Georgi Gerganov ea97a5f469
ggml : vectorized mad q4_0 (ARM) 2 years ago
Georgi Gerganov 8ce6d1e492
gq : add method 6 (ARM) 2 years ago
Georgi Gerganov cc94fdafe7
ggml : 4-bit quantization works (only scalar for now) 2 years ago
Georgi Gerganov b48b09c37f
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models 2 years ago
Georgi Gerganov a366dd31cc
ggml : q4_1 quantization support (seems to work for bigger models) 2 years ago
Georgi Gerganov a37776ddc0
ggml : q4_0 quantization support 2 years ago
Georgi Gerganov 751aa84f1a
gpt-2 : loading Q4_0 quantized model 2 years ago
Georgi Gerganov 38faca7efe
ggml : Q4_0 quantization support (ggml_get_rows()) 2 years ago
Georgi Gerganov ca2714384b
gpt-2 : model conversion for Q4_0 quantization 2 years ago
Georgi Gerganov 1ca898f94b
gq : method 5 (ARM) 2 years ago
Georgi Gerganov 5a96c91bea
gq : method 4 (AVX2 attempt) + method 5 (no min) 2 years ago
Georgi Gerganov cde7c22ab1
gq : method 4 (ARM) 2 years ago
Georgi Gerganov 054d97e0e1
gq : method 4 (AVX2) 2 years ago
Georgi Gerganov 37dcfad83b
gq : progress on method 2 2 years ago
Georgi Gerganov bf709e45de
gq : add amax based method 3 2 years ago
Georgi Gerganov 0a7debb7bf
gq : attempt at n-bit quantization 2 years ago
katsu560 4c2f924553
cmake : update CMakeLists.txt to add correct flags () 2 years ago
Georgi Gerganov ba3e8a3d7f
readme : update Roadmap 2 years ago
Georgi Gerganov 2546cb7780
readme : add Roadmap section 2 years ago
Georgi Gerganov 8f8a5aca99
sync : latest whisper.cpp 2 years ago
Georgi Gerganov efa2cc36a2
tests : fix cblas_sgemm call 2 years ago
Georgi Gerganov 3b3ad42906
tests : add SVD experiments 2 years ago
Georgi Gerganov a6acb3318a
sync : latest whisper.cpp (scratch buffers in ggml) 2 years ago
Georgi Gerganov 47b297224e
Update README.md 2 years ago
Takuya Takeuchi 0467385010
cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC () 2 years ago
Georgi Gerganov fb64edddb7
gpt : fix sampling to use the temperature (close ) 2 years ago
Georgi Gerganov c40a5b51a0
ggml : sync latest whisper.cpp 2 years ago
Georgi Gerganov a0f2f68cdb
gpt-2 : fix broken prompt due to recent experiments 2 years ago
Georgi Gerganov dee3684fec
ggml : sync latest whisper.cpp 2 years ago
Georgi Gerganov 6ed4da0b03
cmake : disable warnings about unused functions 2 years ago
Georgi Gerganov 06e2a3b721
ggml : bugfix in new soft max computation 2 years ago
Georgi Gerganov 78af1420bf
tests : change test2 eps 2 years ago
Georgi Gerganov 1af4cf0102
ggml : sync with latest whisper.cpp 2 years ago