whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Roland Rabien	e70d47baab	Remove C++20 requirement (#257 ) * Remove C++20 requirement * Roll back C features not supported in VS2017	2 years ago
bert hubert	d1da35de06	fix potential bug reading model data into a small size optimized string which could lead to memory corruption. In an SSO string, you can't write data to &str[0] and expect it to work well. Also added a small wrapper function to more safely read model data without having to get the sizeof right. I tested this on tiny, base and large models, there was no change in behaviour.	2 years ago
Georgi Gerganov	603f97ba11	whisper : minor improvemnt in decoding strategy (#244 ) Do not allow for text segments to go beyond end of audio. This partially mitigates some issues when the last audio window is 1-2 seconds just before the end of the audio file and the decoding spirals into a repetition of the last transcribed phrase.	2 years ago
Georgi Gerganov	f8ec718b76	ggml : add F16C CPU flag check	2 years ago
Georgi Gerganov	78d13257be	Try to improve the token sampling strategy (#193 ) * whisper : try to improve the token sampling strategy - Add the "max_initial_timestaamp" token logic from OpenAI - Disallow sampling timestamps that are in the past * whisper : fix the max initial timestamp logic + fallback decoding	2 years ago
Georgi Gerganov	4698dcdb52	whisper : add mechanism for aborting the whisper_full() computation	2 years ago
Georgi Gerganov	e266cb0723	whisper.objc : add real-time processing (#97 ) Similar to the "stream" app	2 years ago
Georgi Gerganov	c207eed431	whisper.objc : fix build warnings	2 years ago
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	2 years ago
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	2 years ago
Georgi Gerganov	128aaadb93	whisper : improve printfs	2 years ago
katsu560	83456076f0	add AVX support	2 years ago
Georgi Gerganov	49706a658a	minor : updates few prints + fix buttons in whisper.wasm	2 years ago
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	2 years ago
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	2 years ago
Georgi Gerganov	a4dfbeecf9	talk.wasm : GPT-2 meets Whisper in WebAssembly (#155 ) * talk : initial real-time transcription in the browser * talk : polishing the UI * talk : ready for beta testing * talk.wasm : rename example	2 years ago
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	2 years ago
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2 years ago
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2 years ago
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2 years ago
greeshmay	2ba66360c9	fix: free ggml_context (close #149 ) (#150 ) * fix: free ggml_context * ggml : free the model's contexts in whisper_free() Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2 years ago
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2 years ago
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	2 years ago
Georgi Gerganov	d5afebd37c	whisper : token-level timestamp refactoring (#49 , #120 ) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters	2 years ago
Georgi Gerganov	02dfd5b8c3	whisper : fix extra memory usage after recent processor changes Had increased the memory buffer to the size of the model and forgot to bring it down.	2 years ago
Georgi Gerganov	57fb46f307	main : add option for word-leve timestamps (very experimental)	2 years ago
Georgi Gerganov	eba62e0fa1	close #113 : fix struct whisper_token_data	2 years ago
Georgi Gerganov	014a119052	minor : fix multiple definitions of to_timestamp()	2 years ago
Georgi Gerganov	dec40be58f	parallel : print time of audio boundaries + fix timings	2 years ago
Georgi Gerganov	0b2dc3c82c	parallel : working	2 years ago
Georgi Gerganov	85d6e1e1e7	main : fix sampling time + add max_context parameter	2 years ago
Georgi Gerganov	72e9cdd6bf	parallel : adding tool for parallel transformer inference	2 years ago
Borislav Stanimirov	c565c569e7	Define WHISPER_BUILD so as to export symbols on Windows	2 years ago
Georgi Gerganov	34bb3ab0cf	ggml : add system info functions	2 years ago
Georgi Gerganov	5f7e9fa2dc	ref #68 , #79 : fix segment time output	2 years ago
Georgi Gerganov	7affd309d3	whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference.	2 years ago
Georgi Gerganov	31ff0c6a1f	wip : experimental color coding of tokens based on probabilities	2 years ago
Georgi Gerganov	8d15a1c635	ci : fix and re-enable tests (2nd try)	2 years ago
Georgi Gerganov	692aa0784f	Revert "ci : fix and re-enable tests" This reverts commit `80aefc9514`.	2 years ago
Georgi Gerganov	80aefc9514	ci : fix and re-enable tests	2 years ago
Georgi Gerganov	7eeef0358a	ref #52 : improve greedy sampling strategy Force timestamp token to be sampled if the probability sum over all timestamp tokens is above the probability of any other token	2 years ago
Georgi Gerganov	e30cf83158	ref #57 , #62 , #63 : remove unions in C-api + remove designated initializers We are not ready for designated initializers - many compilers do not support this C++ feature yet, so removing it's non-trivial usages.	2 years ago
Georgi Gerganov	d6b84b2a23	ref #62 : fix build for some compilers For some reason, new version of GCC panic when the struct type is not specified explicitly	2 years ago
Georgi Gerganov	b4a3875b2c	Revert recent sampling change It does not actually help and seems to produce worse results on some of the samples	2 years ago
Georgi Gerganov	cf67bfffa0	Fix EOT token handling If it is the end of the audio, pick all sampled tokens. Otherwise, print error message.	2 years ago
Georgi Gerganov	d14823582d	Try to improve the sampling strategy a bit It sill fails sometimes when it does not sample a timestamp token for the entire segment. We now print a message in such cases	2 years ago
Georgi Gerganov	20d8e7a309	Fix memory sizes	2 years ago
Georgi Gerganov	72d967bce4	Use Accelerate framework on Apple silicon Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro) Also various extra optimizations: - Multi-threaded NORM operator - Faster GELU via F16 cast	2 years ago
Georgi Gerganov	0ad085f5e8	ref #48 : clear results at the start of whisper_full This way, even if the input audio is empty, the previous results will be removed.	2 years ago
0/0	b799226973	check if spectogram length is <100 before doing anything else fixes #39	2 years ago

1 2

65 Commits (e70d47baab77c6bd8e7e84560c02f59cf0325387)