Commit Graph

84 Commits (gq)
 

Author SHA1 Message Date
Georgi Gerganov 3adf02e311
utils : print quantization histograms
1 year ago
Georgi Gerganov 05e7d26ba4
ggml : add WASM SIMD for Q4_0
1 year ago
Georgi Gerganov 3f08ce7004
whisper : add Q4_1 model sizes
1 year ago
Georgi Gerganov b7621b4fda
ggml : fixes for rpi4
1 year ago
Georgi Gerganov 2fcbd28143
gpt : support quantisation of f16 models files
1 year ago
Georgi Gerganov 10356cdcdd
gpt : seems not worth to use FP16 for KV cache
1 year ago
Georgi Gerganov 4c1032f2d4
whisper : mem usage based on model format type
1 year ago
Georgi Gerganov f2d174f530
whisper : add support for quantized models
1 year ago
Georgi Gerganov b46f35b1f9
whisper : add whisper-qunatize tool
1 year ago
Georgi Gerganov 98f6a4bf94
ggml : fix ggml_is_contiguous() to take into account blck size
1 year ago
Georgi Gerganov eaa4006047
gpt : fix memory usage computation
1 year ago
Georgi Gerganov fde29bd005
ggml : add ggml_compute_forward_rope_f16()
1 year ago
Georgi Gerganov 39265de79f
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
1 year ago
Georgi Gerganov 5bd952ac3f
gpt-2 : minor
1 year ago
Georgi Gerganov 86b1e356b0
gpt : avoid ggml_transpose on model tensors (new models!)
1 year ago
Georgi Gerganov e052167772
ggml : GGML_ASSERT() instead of assert() where appropriate
1 year ago
Georgi Gerganov 11295af7a6
gpt-j : support for 4-bit quantized model inference
1 year ago
Georgi Gerganov 7d5889475a
ggml : minor indentations
1 year ago
Georgi Gerganov e89cb32625
ggml : simplify mad q4_0 (ARM)
1 year ago
Georgi Gerganov 6309a60bac
ggml : vectorized quantize_row_q4_0 (ARM)
1 year ago
Georgi Gerganov ea97a5f469
ggml : vectorized mad q4_0 (ARM)
1 year ago
Georgi Gerganov 8ce6d1e492
gq : add method 6 (ARM)
1 year ago
Georgi Gerganov cc94fdafe7
ggml : 4-bit quantization works (only scalar for now)
1 year ago
Georgi Gerganov b48b09c37f
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
1 year ago
Georgi Gerganov a366dd31cc
ggml : q4_1 quantization support (seems to work for bigger models)
1 year ago
Georgi Gerganov a37776ddc0
ggml : q4_0 quantization support
1 year ago
Georgi Gerganov 751aa84f1a
gpt-2 : loading Q4_0 quantized model
1 year ago
Georgi Gerganov 38faca7efe
ggml : Q4_0 quantization support (ggml_get_rows())
1 year ago
Georgi Gerganov ca2714384b
gpt-2 : model conversion for Q4_0 quantization
1 year ago
Georgi Gerganov 1ca898f94b
gq : method 5 (ARM)
1 year ago
Georgi Gerganov 5a96c91bea
gq : method 4 (AVX2 attempt) + method 5 (no min)
1 year ago
Georgi Gerganov cde7c22ab1
gq : method 4 (ARM)
1 year ago
Georgi Gerganov 054d97e0e1
gq : method 4 (AVX2)
1 year ago
Georgi Gerganov 37dcfad83b
gq : progress on method 2
1 year ago
Georgi Gerganov bf709e45de
gq : add amax based method 3
1 year ago
Georgi Gerganov 0a7debb7bf
gq : attempt at n-bit quantization
1 year ago
katsu560 4c2f924553
cmake : update CMakeLists.txt to add correct flags (#26)
1 year ago
Georgi Gerganov ba3e8a3d7f
readme : update Roadmap
1 year ago
Georgi Gerganov 2546cb7780
readme : add Roadmap section
1 year ago
Georgi Gerganov 8f8a5aca99
sync : latest whisper.cpp
1 year ago
Georgi Gerganov efa2cc36a2
tests : fix cblas_sgemm call
1 year ago
Georgi Gerganov 3b3ad42906
tests : add SVD experiments
1 year ago
Georgi Gerganov a6acb3318a
sync : latest whisper.cpp (scratch buffers in ggml)
1 year ago
Georgi Gerganov 47b297224e
Update README.md
1 year ago
Takuya Takeuchi 0467385010
cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15)
1 year ago
Georgi Gerganov fb64edddb7
gpt : fix sampling to use the temperature (close #16)
1 year ago
Georgi Gerganov c40a5b51a0
ggml : sync latest whisper.cpp
1 year ago
Georgi Gerganov a0f2f68cdb
gpt-2 : fix broken prompt due to recent experiments
1 year ago
Georgi Gerganov dee3684fec
ggml : sync latest whisper.cpp
1 year ago
Georgi Gerganov 6ed4da0b03
cmake : disable warnings about unused functions
1 year ago