Commit Graph

  • 7f32376b70
    llama : initial working FP16 + 4-bit Q4_0 llama Georgi Gerganov 2023-03-10 19:01:07 +0200
  • c2e9635c79
    llama : initial model conversion to ggml format Georgi Gerganov 2023-03-10 17:58:48 +0200
  • d0c651ca46
    Merge 3bce0ec707 into 4c2f924553 #35 Cordeiro 2023-03-10 14:10:09 +0000
  • 3bce0ec707 Fix map tensors #35 Alan 2023-03-10 11:10:03 -0300
  • b7143f03c4 Script to convert h5 to ggml adapted from gpt-j example Alan 2023-03-10 09:48:42 -0300
  • ac9649100b
    Merge 1d38a69d7c into 4c2f924553 #12 Georgi Gerganov 2023-03-07 04:34:43 -0500
  • 036268500e
    Merge 3adf02e311 into 4c2f924553 #27 Georgi Gerganov 2023-03-06 20:33:17 +0000
  • 3adf02e311
    utils : print quantization histograms #27 gq Georgi Gerganov 2023-03-06 22:32:53 +0200
  • 05e7d26ba4
    ggml : add WASM SIMD for Q4_0 Georgi Gerganov 2023-02-27 18:28:27 +0200
  • 3f08ce7004
    whisper : add Q4_1 model sizes Georgi Gerganov 2023-02-26 20:25:36 +0200
  • b7621b4fda
    ggml : fixes for rpi4 Georgi Gerganov 2023-02-26 19:55:55 +0200
  • 2fcbd28143
    gpt : support quantisation of f16 models files Georgi Gerganov 2023-02-26 19:47:36 +0200
  • 10356cdcdd
    gpt : seems not worth to use FP16 for KV cache Georgi Gerganov 2023-02-26 17:59:31 +0200
  • 4c1032f2d4
    whisper : mem usage based on model format type Georgi Gerganov 2023-02-26 17:19:00 +0200
  • f2d174f530
    whisper : add support for quantized models Georgi Gerganov 2023-02-26 17:14:28 +0200
  • b46f35b1f9
    whisper : add whisper-qunatize tool Georgi Gerganov 2023-02-26 17:13:20 +0200
  • 98f6a4bf94
    ggml : fix ggml_is_contiguous() to take into account blck size Georgi Gerganov 2023-02-26 17:13:01 +0200
  • eaa4006047
    gpt : fix memory usage computation Georgi Gerganov 2023-02-26 15:20:18 +0200
  • fde29bd005
    ggml : add ggml_compute_forward_rope_f16() Georgi Gerganov 2023-02-26 15:08:18 +0200
  • 39265de79f
    gpt-j : fix conversion for FP16 models (such as GPT-JT-6B) Georgi Gerganov 2023-02-26 15:03:09 +0200
  • 5bd952ac3f
    gpt-2 : minor Georgi Gerganov 2023-02-26 14:48:35 +0200
  • 86b1e356b0
    gpt : avoid ggml_transpose on model tensors (new models!) Georgi Gerganov 2023-02-26 14:35:34 +0200
  • e052167772
    ggml : GGML_ASSERT() instead of assert() where appropriate Georgi Gerganov 2023-02-26 12:38:12 +0200
  • 11295af7a6
    gpt-j : support for 4-bit quantized model inference Georgi Gerganov 2023-02-26 09:36:05 +0200
  • 7d5889475a
    ggml : minor indentations Georgi Gerganov 2023-02-26 08:37:35 +0200
  • e89cb32625
    ggml : simplify mad q4_0 (ARM) Georgi Gerganov 2023-02-26 08:26:59 +0200
  • 6309a60bac
    ggml : vectorized quantize_row_q4_0 (ARM) Georgi Gerganov 2023-02-26 08:11:24 +0200
  • ea97a5f469
    ggml : vectorized mad q4_0 (ARM) Georgi Gerganov 2023-02-25 19:08:02 +0200
  • 8ce6d1e492
    gq : add method 6 (ARM) Georgi Gerganov 2023-02-25 18:31:23 +0200
  • cc94fdafe7
    ggml : 4-bit quantization works (only scalar for now) Georgi Gerganov 2023-02-25 17:51:27 +0200
  • b48b09c37f
    gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models Georgi Gerganov 2023-02-25 17:36:07 +0200
  • a366dd31cc
    ggml : q4_1 quantization support (seems to work for bigger models) Georgi Gerganov 2023-02-25 15:30:52 +0200
  • a37776ddc0
    ggml : q4_0 quantization support Georgi Gerganov 2023-02-25 12:46:55 +0200
  • 751aa84f1a
    gpt-2 : loading Q4_0 quantized model Georgi Gerganov 2023-02-25 11:59:10 +0200
  • 38faca7efe
    ggml : Q4_0 quantization support (ggml_get_rows()) Georgi Gerganov 2023-02-25 11:58:53 +0200
  • ca2714384b
    gpt-2 : model conversion for Q4_0 quantization Georgi Gerganov 2023-02-25 10:56:15 +0200
  • 1ca898f94b
    gq : method 5 (ARM) Georgi Gerganov 2023-02-24 21:15:13 +0200
  • 5a96c91bea
    gq : method 4 (AVX2 attempt) + method 5 (no min) Georgi Gerganov 2023-02-24 19:34:58 +0200
  • cde7c22ab1
    gq : method 4 (ARM) Georgi Gerganov 2023-02-23 23:02:20 +0200
  • 054d97e0e1
    gq : method 4 (AVX2) Georgi Gerganov 2023-02-23 17:50:57 +0200
  • 37dcfad83b
    gq : progress on method 2 Georgi Gerganov 2023-02-22 23:18:20 +0200
  • bf709e45de
    gq : add amax based method 3 Georgi Gerganov 2023-02-22 08:18:19 +0200
  • 0a7debb7bf
    gq : attempt at n-bit quantization Georgi Gerganov 2023-02-21 22:17:31 +0200
  • 4c2f924553
    cmake : update CMakeLists.txt to add correct flags (#26) master katsu560 2023-03-07 02:52:16 +0900
  • 3bd975fcc1
    cmake : remove OpenBLAS stuff #26 Georgi Gerganov 2023-03-06 19:46:58 +0200
  • ba3e8a3d7f
    readme : update Roadmap Georgi Gerganov 2023-03-06 07:40:55 +0200
  • 2546cb7780
    readme : add Roadmap section Georgi Gerganov 2023-03-05 18:02:27 +0200
  • 8f8a5aca99
    sync : latest whisper.cpp Georgi Gerganov 2023-02-26 21:10:50 +0200
  • 12cafd0572 modify src/CMakeLists.txt from whisper.cpp katsu560 2023-02-26 16:04:00 +0900
  • efa2cc36a2
    tests : fix cblas_sgemm call Georgi Gerganov 2023-02-21 22:16:56 +0200
  • baeb88b858 tests : add 4-bit Clover-based quantization 4bit Georgi Gerganov 2023-02-20 20:57:17 +0200
  • 3b3ad42906
    tests : add SVD experiments Georgi Gerganov 2023-02-18 16:05:31 +0200
  • a6acb3318a
    sync : latest whisper.cpp (scratch buffers in ggml) Georgi Gerganov 2023-02-15 20:59:36 +0200
  • 47b297224e
    Update README.md Georgi Gerganov 2023-01-20 08:45:45 +0200
  • 0467385010
    cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15) Takuya Takeuchi 2023-01-15 23:30:13 +0900
  • fb64edddb7
    gpt : fix sampling to use the temperature (close #16) Georgi Gerganov 2023-01-15 15:53:08 +0200
  • c40a5b51a0
    ggml : sync latest whisper.cpp Georgi Gerganov 2023-01-15 15:09:36 +0200
  • cc0ebdda55 configure CMAKE_C_FLAGS and target_link_libraries for MSVC #15 Takuya Takeuchi 2023-01-09 08:36:03 +0900
  • a0f2f68cdb
    gpt-2 : fix broken prompt due to recent experiments Georgi Gerganov 2023-01-08 20:28:38 +0200
  • dee3684fec
    ggml : sync latest whisper.cpp Georgi Gerganov 2023-01-08 20:23:01 +0200
  • 6ed4da0b03
    cmake : disable warnings about unused functions Georgi Gerganov 2023-01-07 21:05:33 +0200
  • 06e2a3b721
    ggml : bugfix in new soft max computation Georgi Gerganov 2023-01-07 21:04:24 +0200
  • 78af1420bf
    tests : change test2 eps Georgi Gerganov 2023-01-07 20:00:25 +0200
  • 1af4cf0102
    ggml : sync with latest whisper.cpp Georgi Gerganov 2023-01-07 19:53:05 +0200
  • 73a7916d30
    tests : some more quantization experiments Georgi Gerganov 2023-01-07 12:17:34 +0200
  • e0abac1be7
    sync : forgot to sync ggml.h Georgi Gerganov 2023-01-07 09:43:02 +0200
  • 45fc4fed0b
    sync : latest changes from whisper.cpp Georgi Gerganov 2023-01-07 09:39:12 +0200
  • deb0c486c7
    tests : wip quantized matrix multiplication method 2 Georgi Gerganov 2023-01-07 09:36:32 +0200
  • d677c7f61d tests : minor fixes for x86 Georgi Gerganov 2023-01-07 09:31:42 +0200
  • 446ccf3ab1
    tests : experiments with n-bit quantized matrix multiplication Georgi Gerganov 2023-01-05 21:05:41 +0200
  • 1d38a69d7c
    t5 : initial load in ggml #12 t5 Georgi Gerganov 2023-01-02 16:11:13 +0200
  • a0f92eff2d
    t5 : initial ggml conversion of the model Georgi Gerganov 2022-12-31 16:08:30 +0200
  • ed683187cb
    t5 : add example for text-to-text transfer transformer inference Georgi Gerganov 2022-12-31 13:57:04 +0200
  • bd9f710a45
    sync : latest changes from whisper.cpp Georgi Gerganov 2022-12-31 12:32:04 +0200
  • 1dcbe86a0c
    gpt-2 : experimenting with attention mask Georgi Gerganov 2022-12-31 12:29:52 +0200
  • 99f1afb613
    gpt-2 : fix off-by-one error in batching logic Georgi Gerganov 2022-12-31 12:29:30 +0200
  • 64efeceabd
    examples : redirect download scripts to HF Georgi Gerganov 2022-12-12 23:49:12 +0200
  • ed09c7190e
    gpt : add support for gpt-jt + fix unicode support Georgi Gerganov 2022-12-04 18:33:14 +0200
  • f56828ed78
    ggml : sync with latest code from whisper.cpp Georgi Gerganov 2022-12-04 11:06:13 +0200
  • 90ee5c6358
    sync : latest changes from whisper.cpp Georgi Gerganov 2022-11-09 21:43:03 +0200
  • db13973820
    Update README.md Georgi Gerganov 2022-11-01 22:15:22 +0200
  • 6feeca262f
    sync : latest changes from whisper.cpp Georgi Gerganov 2022-11-01 22:13:15 +0200
  • 624e4f5313
    whisper : fix timestamp sampling Georgi Gerganov 2022-10-18 21:14:27 +0300
  • 7094be1f37
    sync : whisper.cpp Georgi Gerganov 2022-10-18 19:12:07 +0300
  • 270829aa9f
    sync : whisper.cpp Georgi Gerganov 2022-10-17 23:54:35 +0300
  • 7b70c5a561
    Minor fixes Georgi Gerganov 2022-10-17 21:31:23 +0300
  • d8f64bce3d
    Improve mul_mat performance for big matrices using Accelerate framework Georgi Gerganov 2022-10-17 21:20:33 +0300
  • ea0ef2a41e
    Performance tests - trying to optimize mul_mat Georgi Gerganov 2022-10-17 21:17:13 +0300
  • 3afb833f84
    wip : unsuccessful attempts speeding mul_mat using blocking experiments/blocking Georgi Gerganov 2022-10-13 22:19:43 +0300
  • 67ac34fcfa
    sync : whisper.cpp Georgi Gerganov 2022-10-13 22:18:46 +0300
  • e2f39f4b52
    whisper : sync with whisper.cpp Georgi Gerganov 2022-10-08 18:15:22 +0300
  • 8e3c634b27
    whisper : various improvements Georgi Gerganov 2022-10-05 23:15:10 +0300
  • 8ca553add4
    whisper : add C-style API Georgi Gerganov 2022-10-04 23:17:35 +0300
  • dd1f4dfbab
    whisper : various fixes Georgi Gerganov 2022-10-03 19:31:17 +0300
  • 0116c03fb7
    whisper : various updates and improvements Georgi Gerganov 2022-09-30 19:16:07 +0300
  • 787efb4d2e
    Adding Whisper inference example Georgi Gerganov 2022-09-28 21:12:20 +0300
  • f21b84cd21
    Update README.md + minor stuff Georgi Gerganov 2022-09-20 00:09:34 +0300
  • 0f4e99b1cc
    Update README.md Georgi Gerganov 2022-09-18 20:12:43 +0300
  • fb558f78d9
    Initial release Georgi Gerganov 2022-09-18 20:11:11 +0300