Commit Graph

23 Commits (3adf02e311eaf3ec03d327394d53c1668faf5572)

Author SHA1 Message Date
Georgi Gerganov 3adf02e311
utils : print quantization histograms
2 years ago
Georgi Gerganov 2fcbd28143
gpt : support quantisation of f16 models files
2 years ago
Georgi Gerganov 10356cdcdd
gpt : seems not worth to use FP16 for KV cache
2 years ago
Georgi Gerganov eaa4006047
gpt : fix memory usage computation
2 years ago
Georgi Gerganov fde29bd005
ggml : add ggml_compute_forward_rope_f16()
2 years ago
Georgi Gerganov 5bd952ac3f
gpt-2 : minor
2 years ago
Georgi Gerganov 86b1e356b0
gpt : avoid ggml_transpose on model tensors (new models!)
2 years ago
Georgi Gerganov ea97a5f469
ggml : vectorized mad q4_0 (ARM)
2 years ago
Georgi Gerganov cc94fdafe7
ggml : 4-bit quantization works (only scalar for now)
2 years ago
Georgi Gerganov b48b09c37f
gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
2 years ago
Georgi Gerganov a366dd31cc
ggml : q4_1 quantization support (seems to work for bigger models)
2 years ago
Georgi Gerganov a37776ddc0
ggml : q4_0 quantization support
2 years ago
Georgi Gerganov 751aa84f1a
gpt-2 : loading Q4_0 quantized model
2 years ago
Georgi Gerganov ca2714384b
gpt-2 : model conversion for Q4_0 quantization
2 years ago
Georgi Gerganov fb64edddb7
gpt : fix sampling to use the temperature (close #16)
2 years ago
Georgi Gerganov a0f2f68cdb
gpt-2 : fix broken prompt due to recent experiments
2 years ago
Georgi Gerganov 1dcbe86a0c
gpt-2 : experimenting with attention mask
2 years ago
Georgi Gerganov 99f1afb613
gpt-2 : fix off-by-one error in batching logic
2 years ago
Georgi Gerganov 64efeceabd
examples : redirect download scripts to HF
2 years ago
Georgi Gerganov ed09c7190e
gpt : add support for gpt-jt + fix unicode support
2 years ago
Georgi Gerganov 787efb4d2e
Adding Whisper inference example
2 years ago
Georgi Gerganov f21b84cd21
Update README.md + minor stuff
2 years ago
Georgi Gerganov fb558f78d9
Initial release
2 years ago