Commit Graph

79 Commits (10356cdcddfff668c8cc05752e6f11c97bfd50e7)
 

Author SHA1 Message Date
Georgi Gerganov 10356cdcdd
gpt : seems not worth to use FP16 for KV cache
2 years ago
Georgi Gerganov 4c1032f2d4
whisper : mem usage based on model format type
2 years ago
Georgi Gerganov f2d174f530
whisper : add support for quantized models
2 years ago
Georgi Gerganov b46f35b1f9
whisper : add whisper-qunatize tool
2 years ago
Georgi Gerganov 98f6a4bf94
ggml : fix ggml_is_contiguous() to take into account blck size
2 years ago
Georgi Gerganov eaa4006047
gpt : fix memory usage computation
2 years ago
Georgi Gerganov fde29bd005
ggml : add ggml_compute_forward_rope_f16()
2 years ago
Georgi Gerganov 39265de79f
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
2 years ago
Georgi Gerganov 5bd952ac3f
gpt-2 : minor
2 years ago
Georgi Gerganov 86b1e356b0
gpt : avoid ggml_transpose on model tensors (new models!)
2 years ago
Georgi Gerganov e052167772
ggml : GGML_ASSERT() instead of assert() where appropriate
2 years ago
Georgi Gerganov 11295af7a6
gpt-j : support for 4-bit quantized model inference
2 years ago
Georgi Gerganov 7d5889475a
ggml : minor indentations
2 years ago
Georgi Gerganov e89cb32625
ggml : simplify mad q4_0 (ARM)
2 years ago
Georgi Gerganov 6309a60bac
ggml : vectorized quantize_row_q4_0 (ARM)
2 years ago
Georgi Gerganov ea97a5f469
ggml : vectorized mad q4_0 (ARM)
2 years ago
Georgi Gerganov 8ce6d1e492
gq : add method 6 (ARM)
2 years ago
Georgi Gerganov cc94fdafe7
ggml : 4-bit quantization works (only scalar for now)
2 years ago
Georgi Gerganov b48b09c37f
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
2 years ago
Georgi Gerganov a366dd31cc
ggml : q4_1 quantization support (seems to work for bigger models)
2 years ago
Georgi Gerganov a37776ddc0
ggml : q4_0 quantization support
2 years ago
Georgi Gerganov 751aa84f1a
gpt-2 : loading Q4_0 quantized model
2 years ago
Georgi Gerganov 38faca7efe
ggml : Q4_0 quantization support (ggml_get_rows())
2 years ago
Georgi Gerganov ca2714384b
gpt-2 : model conversion for Q4_0 quantization
2 years ago
Georgi Gerganov 1ca898f94b
gq : method 5 (ARM)
2 years ago
Georgi Gerganov 5a96c91bea
gq : method 4 (AVX2 attempt) + method 5 (no min)
2 years ago
Georgi Gerganov cde7c22ab1
gq : method 4 (ARM)
2 years ago
Georgi Gerganov 054d97e0e1
gq : method 4 (AVX2)
2 years ago
Georgi Gerganov 37dcfad83b
gq : progress on method 2
2 years ago
Georgi Gerganov bf709e45de
gq : add amax based method 3
2 years ago
Georgi Gerganov 0a7debb7bf
gq : attempt at n-bit quantization
2 years ago
katsu560 4c2f924553
cmake : update CMakeLists.txt to add correct flags (#26)
2 years ago
Georgi Gerganov ba3e8a3d7f
readme : update Roadmap
2 years ago
Georgi Gerganov 2546cb7780
readme : add Roadmap section
2 years ago
Georgi Gerganov 8f8a5aca99
sync : latest whisper.cpp
2 years ago
Georgi Gerganov efa2cc36a2
tests : fix cblas_sgemm call
2 years ago
Georgi Gerganov 3b3ad42906
tests : add SVD experiments
2 years ago
Georgi Gerganov a6acb3318a
sync : latest whisper.cpp (scratch buffers in ggml)
2 years ago
Georgi Gerganov 47b297224e
Update README.md
2 years ago
Takuya Takeuchi 0467385010
cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15)
2 years ago
Georgi Gerganov fb64edddb7
gpt : fix sampling to use the temperature (close #16)
2 years ago
Georgi Gerganov c40a5b51a0
ggml : sync latest whisper.cpp
2 years ago
Georgi Gerganov a0f2f68cdb
gpt-2 : fix broken prompt due to recent experiments
2 years ago
Georgi Gerganov dee3684fec
ggml : sync latest whisper.cpp
2 years ago
Georgi Gerganov 6ed4da0b03
cmake : disable warnings about unused functions
2 years ago
Georgi Gerganov 06e2a3b721
ggml : bugfix in new soft max computation
2 years ago
Georgi Gerganov 78af1420bf
tests : change test2 eps
2 years ago
Georgi Gerganov 1af4cf0102
ggml : sync with latest whisper.cpp
2 years ago
Georgi Gerganov 73a7916d30
tests : some more quantization experiments
2 years ago
Georgi Gerganov e0abac1be7
sync : forgot to sync ggml.h
2 years ago