whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	8de452c18b	Improve decoding (#291 ) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders	2 years ago
Georgi Gerganov	a6dbd9188b	stream : fix a bug that inserted a lot of empty audio at the start The quality was terrible due to this	2 years ago
Syahmi Azhar	1512545149	whisper : add loader class to allow loading from buffer and others (#353 ) * whisper : add loader to allow loading from other than file * whisper : rename whisper_init to whisper_init_from_file * whisper : add whisper_init_from_buffer * android : Delete local.properties * android : load models directly from assets * whisper : adding <stddef.h> needed for size_t + code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2 years ago
Georgi Gerganov	a466c3404d	stream : fix data race on bool + avoid division-by-zero	2 years ago
Andy Maloney	dc90efd504	examples : small code cleanups (#322 ) - remove unnecessary initialization of string to "" - use empty() instead of checking size() - use emplace_back instead of push_back - use nullptr instead of NULL - remove unnecessary call to .data() on string - use character overload of find_first_of() instead of passing a string	2 years ago
Georgi Gerganov	99da1e5cc8	cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings	2 years ago
Georgi Gerganov	a82d331034	stream : update README.md + comments	2 years ago
Georgi Gerganov	34e0b4b9ef	stream : fix build	2 years ago
Georgi Gerganov	b0f8013eb9	stream : add sliding window mode	2 years ago
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	2 years ago
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	2 years ago
Georgi Gerganov	d7024cf9dc	main, stream : remove --verbose flag (#178 )	2 years ago
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	2 years ago
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	2 years ago
Georgi Gerganov	f2df9bd768	stream : add "max_tokens" cli arg Controls the max tokens per segment for the stream example	2 years ago
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	2 years ago
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2 years ago
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2 years ago
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2 years ago
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2 years ago
Georgi Gerganov	5a9e4260a6	stream : add "--capture" option to select capture device (ref #10 )	2 years ago
Georgi Gerganov	8347a7bb6a	stream : few updates to make it compatible for Vim usage (#99 )	2 years ago
Georgi Gerganov	c6710efde2	refactoring : move main + stream in examples + other stuff	2 years ago

23 Commits (8738427dd60bda894df1ff3c12317cca2e960016)