Georgi Gerganov
|
10356cdcdd
|
gpt : seems not worth to use FP16 for KV cache
|
2 years ago |
Georgi Gerganov
|
4c1032f2d4
|
whisper : mem usage based on model format type
|
2 years ago |
Georgi Gerganov
|
f2d174f530
|
whisper : add support for quantized models
|
2 years ago |
Georgi Gerganov
|
b46f35b1f9
|
whisper : add whisper-qunatize tool
|
2 years ago |
Georgi Gerganov
|
98f6a4bf94
|
ggml : fix ggml_is_contiguous() to take into account blck size
|
2 years ago |
Georgi Gerganov
|
eaa4006047
|
gpt : fix memory usage computation
|
2 years ago |
Georgi Gerganov
|
fde29bd005
|
ggml : add ggml_compute_forward_rope_f16()
|
2 years ago |
Georgi Gerganov
|
39265de79f
|
gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
|
2 years ago |
Georgi Gerganov
|
5bd952ac3f
|
gpt-2 : minor
|
2 years ago |
Georgi Gerganov
|
86b1e356b0
|
gpt : avoid ggml_transpose on model tensors (new models!)
|
2 years ago |
Georgi Gerganov
|
e052167772
|
ggml : GGML_ASSERT() instead of assert() where appropriate
|
2 years ago |
Georgi Gerganov
|
11295af7a6
|
gpt-j : support for 4-bit quantized model inference
|
2 years ago |
Georgi Gerganov
|
7d5889475a
|
ggml : minor indentations
|
2 years ago |
Georgi Gerganov
|
e89cb32625
|
ggml : simplify mad q4_0 (ARM)
|
2 years ago |
Georgi Gerganov
|
6309a60bac
|
ggml : vectorized quantize_row_q4_0 (ARM)
|
2 years ago |
Georgi Gerganov
|
ea97a5f469
|
ggml : vectorized mad q4_0 (ARM)
|
2 years ago |
Georgi Gerganov
|
8ce6d1e492
|
gq : add method 6 (ARM)
|
2 years ago |
Georgi Gerganov
|
cc94fdafe7
|
ggml : 4-bit quantization works (only scalar for now)
|
2 years ago |
Georgi Gerganov
|
b48b09c37f
|
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
|
2 years ago |
Georgi Gerganov
|
a366dd31cc
|
ggml : q4_1 quantization support (seems to work for bigger models)
|
2 years ago |
Georgi Gerganov
|
a37776ddc0
|
ggml : q4_0 quantization support
|
2 years ago |
Georgi Gerganov
|
751aa84f1a
|
gpt-2 : loading Q4_0 quantized model
|
2 years ago |
Georgi Gerganov
|
38faca7efe
|
ggml : Q4_0 quantization support (ggml_get_rows())
|
2 years ago |
Georgi Gerganov
|
ca2714384b
|
gpt-2 : model conversion for Q4_0 quantization
|
2 years ago |
Georgi Gerganov
|
1ca898f94b
|
gq : method 5 (ARM)
|
2 years ago |
Georgi Gerganov
|
5a96c91bea
|
gq : method 4 (AVX2 attempt) + method 5 (no min)
|
2 years ago |
Georgi Gerganov
|
cde7c22ab1
|
gq : method 4 (ARM)
|
2 years ago |
Georgi Gerganov
|
054d97e0e1
|
gq : method 4 (AVX2)
|
2 years ago |
Georgi Gerganov
|
37dcfad83b
|
gq : progress on method 2
|
2 years ago |
Georgi Gerganov
|
bf709e45de
|
gq : add amax based method 3
|
2 years ago |
Georgi Gerganov
|
0a7debb7bf
|
gq : attempt at n-bit quantization
|
2 years ago |
katsu560
|
4c2f924553
|
cmake : update CMakeLists.txt to add correct flags (#26)
* modify src/CMakeLists.txt from whisper.cpp
* cmake : remove OpenBLAS stuff
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2 years ago |
Georgi Gerganov
|
ba3e8a3d7f
|
readme : update Roadmap
|
2 years ago |
Georgi Gerganov
|
2546cb7780
|
readme : add Roadmap section
|
2 years ago |
Georgi Gerganov
|
8f8a5aca99
|
sync : latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
efa2cc36a2
|
tests : fix cblas_sgemm call
|
2 years ago |
Georgi Gerganov
|
3b3ad42906
|
tests : add SVD experiments
|
2 years ago |
Georgi Gerganov
|
a6acb3318a
|
sync : latest whisper.cpp (scratch buffers in ggml)
|
2 years ago |
Georgi Gerganov
|
47b297224e
|
Update README.md
|
2 years ago |
Takuya Takeuchi
|
0467385010
|
cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15)
|
2 years ago |
Georgi Gerganov
|
fb64edddb7
|
gpt : fix sampling to use the temperature (close #16)
|
2 years ago |
Georgi Gerganov
|
c40a5b51a0
|
ggml : sync latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
a0f2f68cdb
|
gpt-2 : fix broken prompt due to recent experiments
No idea why I commited that!?
|
2 years ago |
Georgi Gerganov
|
dee3684fec
|
ggml : sync latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
6ed4da0b03
|
cmake : disable warnings about unused functions
|
2 years ago |
Georgi Gerganov
|
06e2a3b721
|
ggml : bugfix in new soft max computation
|
2 years ago |
Georgi Gerganov
|
78af1420bf
|
tests : change test2 eps
|
2 years ago |
Georgi Gerganov
|
1af4cf0102
|
ggml : sync with latest whisper.cpp
|
2 years ago |
Georgi Gerganov
|
73a7916d30
|
tests : some more quantization experiments
|
2 years ago |
Georgi Gerganov
|
e0abac1be7
|
sync : forgot to sync ggml.h
|
2 years ago |