whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	0d229163bb	whisper : add API for applying custom logits filters during decoding	2 years ago
shibukazu	cfc06bf8df	whisper : suppress non-speech-related token outputs (#473 ) * add non-speech-token suppression * add suppress non-speech_tokens param	2 years ago
sandrohanea	2bfe0ebc0f	whisper : fixed Beam Search Strategy and exposed whisper_pcm_to_mel_phase_vocoder (#474 ) Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>	2 years ago
kamranjon	a1c1583cc7	whisper : add whisper_full_lang_id() for getting the context lang (#461 )	2 years ago
Matija Pevec	d012b5c7e4	whisper : add "split_on_word" flag when using using "max_len" option (#455 ) * Update whisper.cpp * fix: trim function * feat: added flag to split on word * fix: arguments for main	2 years ago
Georgi Gerganov	1ccb8a46a5	bench : fix Windows linkage by moving ggml benches in whisper lib ..	3 years ago
Georgi Gerganov	c9aeb33676	stream : fix --keep_context argument to be used correctly (#354 )	3 years ago
Georgi Gerganov	8de452c18b	Improve decoding (#291 ) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders	3 years ago
Georgi Gerganov	4ef3398e8f	ggml : remove obsolete zeroing + comment fixes (#390 )	3 years ago
Syahmi Azhar	1512545149	whisper : add loader class to allow loading from buffer and others (#353 ) * whisper : add loader to allow loading from other than file * whisper : rename whisper_init to whisper_init_from_file * whisper : add whisper_init_from_buffer * android : Delete local.properties * android : load models directly from assets * whisper : adding <stddef.h> needed for size_t + code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	3 years ago
Georgi Gerganov	d51c5eb906	ggml : define MIN / MAX only if not defined (minor)	3 years ago
Georgi Gerganov	d97e6005e9	whisper : add whisper_n_audio_ctx and check for invalid audio_ctx closes #344	3 years ago
Matheus de Sousa	8e3f129b4d	minor : resolves some of warnings when compiling with clang/clang++ (#294 ) * Resolves some of warnings when compiling with clang/clang++ Mostly nit stuff that clang catches when compiling with -Wall -Wextra -pedantic. - Fix comparison between sign/unsigned integers. - Passes a constant reference (const&) instead of copying each time. * minor : normalize coding style * minor : fix warning Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	3 years ago
Georgi Gerganov	fba10a4c68	whisper : language auto-detect (#59 )	3 years ago
Georgi Gerganov	bf69b669a0	whisper : add whisper_tokenize() Tokenizes a string into a list of vocabulary tokens	3 years ago
Georgi Gerganov	78d13257be	Try to improve the token sampling strategy (#193 ) * whisper : try to improve the token sampling strategy - Add the "max_initial_timestaamp" token logic from OpenAI - Disallow sampling timestamps that are in the past * whisper : fix the max initial timestamp logic + fallback decoding	3 years ago
Georgi Gerganov	4698dcdb52	whisper : add mechanism for aborting the whisper_full() computation	3 years ago
Georgi Gerganov	e266cb0723	whisper.objc : add real-time processing (#97 ) Similar to the "stream" app	3 years ago
Georgi Gerganov	c207eed431	whisper.objc : fix build warnings	3 years ago
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	3 years ago
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	3 years ago
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	3 years ago
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	3 years ago
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	3 years ago
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	3 years ago
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	3 years ago
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	3 years ago
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	3 years ago
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	3 years ago
Georgi Gerganov	d5afebd37c	whisper : token-level timestamp refactoring (#49 , #120 ) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters	3 years ago
Georgi Gerganov	57fb46f307	main : add option for word-leve timestamps (very experimental)	3 years ago
Georgi Gerganov	eba62e0fa1	close #113 : fix struct whisper_token_data	3 years ago
Georgi Gerganov	dec40be58f	parallel : print time of audio boundaries + fix timings	3 years ago
Georgi Gerganov	0b2dc3c82c	parallel : working	3 years ago
Georgi Gerganov	85d6e1e1e7	main : fix sampling time + add max_context parameter	3 years ago
Georgi Gerganov	72e9cdd6bf	parallel : adding tool for parallel transformer inference	3 years ago
Georgi Gerganov	34bb3ab0cf	ggml : add system info functions	3 years ago
Georgi Gerganov	7affd309d3	whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference.	3 years ago
Georgi Gerganov	31ff0c6a1f	wip : experimental color coding of tokens based on probabilities	3 years ago
Georgi Gerganov	7eeef0358a	ref #52 : improve greedy sampling strategy Force timestamp token to be sampled if the probability sum over all timestamp tokens is above the probability of any other token	3 years ago
Georgi Gerganov	2d171ced32	close #32 : add comment about thread-safety of the C-style API	3 years ago
Georgi Gerganov	e30cf83158	ref #57 , #62 , #63 : remove unions in C-api + remove designated initializers We are not ready for designated initializers - many compilers do not support this C++ feature yet, so removing it's non-trivial usages.	3 years ago
Georgi Gerganov	9d5723435f	ref #35 : add <stdbool.h> to whisper.h "bool" type is not implicitly defined for some compilers.	3 years ago
Georgi Gerganov	9bbca3110f	ref #9 : add API documentation in whisper.h	3 years ago
Georgi Gerganov	2f069335ab	Adding sanitizer tests	3 years ago
Georgi Gerganov	481cd685d5	ref #10 : option to keep context in "stream" example Seems the results become worse when we keep the context, so by default this is not enabled	3 years ago
Georgi Gerganov	7787b878e1	ref #16 , #22 : add "offset" argument Allows to start processing the input audio at some offset from the beginning. Useful for splitting a long job into multiple tasks.	3 years ago
Georgi Gerganov	6814cc9b02	Improve result printing	3 years ago
Georgi Gerganov	eba33adadd	Extend C-style API with full inference methods	3 years ago
Georgi Gerganov	6b77124e01	Initial C-style interface for whisper.cpp	3 years ago

50 Commits (6e776543f334516d6647cebcf08798319524a3b9)