whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Abitofevrything	a62170c656	ggml : add SSE3 and fp16 conversion lookup table (#368 ) * Improves WASM performance: On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome * Add support for SSE3 SIMD * Add SSE3 to system information * Add Imath support for fp16-fp32 conversions * Add Imath to system information * Wrap Imath calls to avoid static function warnings * Drop Imath; Add lookup table for f16 -> f32 conversions * Remove TODO comments * Update SSE3 to new macro arguments * Correct updated macro definitions * Prefer static inline where possible * ggml : static inlines + add public f16 <-> f32 conversions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	3 years ago
Thomas Fitzsimmons	1944e7c33e	whisper : document POWER VSX support	3 years ago
Thomas Fitzsimmons	49a8dd6732	ggml : reorganize POWER9 ppc64le SIMD code	3 years ago
Thomas Fitzsimmons	8c7f642286	ggml : change f16 load and store macro arguments	3 years ago
Georgi Gerganov	0a0cfa7985	ggml : add void to argument-less functions	3 years ago
Georgi Gerganov	d51c5eb906	ggml : define MIN / MAX only if not defined (minor)	3 years ago
Thomas Fitzsimmons	424c410c42	ggml : improve f16 acceleration for POWER9 ppc64le	3 years ago
Georgi Gerganov	4e0b2069e7	ggml : barrier refactor + static functions	3 years ago
Georgi Gerganov	ac521a566e	ggml : simplify the SIMD code (#324 ) * ggml : simplify the SIMD code * ggml : generic reduce for all register sizes + comments	3 years ago
Georgi Gerganov	7282e2109e	ggml : use vaddvq_f32 for slightly more efficient reduce	3 years ago
Thomas Fitzsimmons	466ceebb78	ggml : add f16 acceleration for POWER9 ppc64le	3 years ago
Andy Maloney	493d94130d	ggml : make consts static (#317 ) These shouldn't be able to be referenced outside the compilation unit.	3 years ago
Andy Maloney	fa463313ad	minor : small code cleanups (#302 ) * Small code cleanups - fix indentation - remove extra semicolons - remove extra break after returns in case statements - remove unnecessary call to .data() on string - use empty() instead of checking size() - no need to check for nullptr before free - remove unnecessary initialization of string to "" * minor : switch case always break Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	3 years ago
Kevin Brothaler	e1432dd91a	Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a. Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks. * Also removed ABI filter from Android project.	3 years ago
katsu560	419b8a6402	Add AVX,AVX2 support for ggml_vec_scale_f32	3 years ago
Georgi Gerganov	a7047b2a28	ggml : implement ggml_compute_forward_dup_f16() special cases	3 years ago
Georgi Gerganov	0f11759406	ggml : make more compatible with c99 (#262 )	3 years ago
Georgi Gerganov	f66ac6dc4f	ggml : fix indentation	3 years ago
Georgi Gerganov	9955fa4ed7	ggml : make compatible with c99 (#262 )	3 years ago
Roland Rabien	e70d47baab	Remove C++20 requirement (#257 ) * Remove C++20 requirement * Roll back C features not supported in VS2017	3 years ago
Georgi Gerganov	3b1aacbe6d	talk : talk with AI in the terminal	3 years ago
Georgi Gerganov	50a061b313	ggml : add alternative cblas_sgemm call	3 years ago
Al Hoang	04a16bbf11	fix compilation on haiku	3 years ago
Georgi Gerganov	b6597539f9	ggml : fix typo in previous commit	3 years ago
Georgi Gerganov	9a4b7a916e	ggml : use macros to inline FP16 <-> FP32 conversions	3 years ago
Georgi Gerganov	f8ec718b76	ggml : add F16C CPU flag check	3 years ago
katsu560	35b40a93b9	add fp16/fp32 convert intrinsics	3 years ago
Georgi Gerganov	061fc81bd6	ggml : remove inline specifier from fp16 <-> fp32 converters	3 years ago
Georgi Gerganov	388e9f79ad	ggml : fix the fix	3 years ago
Georgi Gerganov	35cd29ce1f	ggml : fix cross-compile Linux -> Window with mingw (#168 )	3 years ago
katsu560	804f36aa2c	ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16	3 years ago
katsu560	83456076f0	add AVX support	3 years ago
Georgi Gerganov	2065572a11	ggml : fix Windows build	3 years ago
boolemancer	0bfe728b84	Fix the Windows pthread_create shim The current implementation doesn't actually set the out parameter, and it returns 0 on failure instead of on success.	3 years ago
Georgi Gerganov	75171c2b79	ggml : multi-thread the ggml_add operator	3 years ago
Georgi Gerganov	137321915f	ggml : fix the check for NEON support (#7 ) Was using the wrong preprocessor macro	3 years ago
Syed Jafri	24cd12f647	Cross compilation (#121 ) * Cross compile windows * set env properly * rm log * fix review * Add back space	3 years ago
Mikhail Grigorev	8dac3c6e10	Fixed sched_yield	3 years ago
Mikhail Grigorev	6417e59aad	Implemenated sched_yield function for Windows	3 years ago
Georgi Gerganov	e5044f87d9	ggml : fix barrier	3 years ago
Georgi Gerganov	a272f10b2e	ggml : fix thread-safety of ggml_init and ggml_free	3 years ago
Georgi Gerganov	fbd513b813	Add OpenBLAS support Supported via CMake - just add: cmake .. -DWHISPER_SUPPORT_OPENBLAS=ON On Ubuntu, you have to install the library like this: apt install libopenblas-dev Unfortunately, I don't observe any benefit compared to the original AVX2 + FP16 implementation. Maybe I'm missing something	3 years ago
Georgi Gerganov	34bb3ab0cf	ggml : add system info functions	3 years ago
Georgi Gerganov	c6710efde2	refactoring : move main + stream in examples + other stuff	3 years ago
Georgi Gerganov	db460b78ff	wip : WASM 128-bit SIMD support	3 years ago
Georgi Gerganov	e905c6f827	wip : initial WASM port Works but it is very slow because no SIMD is used. For example, jfk.wav is processed in ~23 seconds using "tiny.en" model	3 years ago
Georgi Gerganov	19817711b4	Add reference to FP16 repo	3 years ago
Georgi Gerganov	e36aabe00d	Correct implementation of FP16 GELU Can toggle it via the GGML_GELU_FP16 macro	3 years ago
Georgi Gerganov	91632eb6ea	Revert GELU change Seems it does not work on x86 for some reason	3 years ago
Georgi Gerganov	72d967bce4	Use Accelerate framework on Apple silicon Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro) Also various extra optimizations: - Multi-threaded NORM operator - Faster GELU via F16 cast	3 years ago

1 2

57 Commits (1652965529a467213b588491d7849292df7808d9)