Georgi Gerganov
8de452c18b
Improve decoding ( #291 )
...
* whisper : prepare infra for new decoding strategies
* whisper : apply logit filters and compute logprobs
* whisper : add whisper_get_logits()
* whisper : separate self and cross attention memory
Initial step needed for supporting parallel decoders
* whisper : move probs_id buffer to whisper_context
* whisper : refactor kv cache into separate struct
* whisper : move self-attention kv cache to whisper_decoder
* whisper : wip decoding parameters + strategies
* whisper : wip decoding parameters + strategies (part 2)
* whisper : wip decoding parameters + strategies (part 3)
* whisper : wip decoding parameters + strategies (part 4)
* whisper : fix prompt_past update to not include prompt_init
* whisper : temperature + best_of support
* whisper : support for compression_ration_threshold
We actually use entropy, but it is similar
* command : fix example to use logits instead of obsolete probs
* whisper : handle empty sequence ranking
* whisper : add WHISPER_DEBUG + diagnostic prints + new main args
* whisper : minor fixes
* whisper : add beam-search support
* whisper : bug fix when there no previous context
* whisper : add comments
* stream : disable temperature fallback
For real-time processing, we always want a single decoder running at T=0
* whisper.swiftui : update example - fix paths + add empty folders
2 years ago
Georgi Gerganov
a6dbd9188b
stream : fix a bug that inserted a lot of empty audio at the start
...
The quality was terrible due to this
2 years ago
Syahmi Azhar
1512545149
whisper : add loader class to allow loading from buffer and others ( #353 )
...
* whisper : add loader to allow loading from other than file
* whisper : rename whisper_init to whisper_init_from_file
* whisper : add whisper_init_from_buffer
* android : Delete local.properties
* android : load models directly from assets
* whisper : adding <stddef.h> needed for size_t + code style
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2 years ago
Georgi Gerganov
a466c3404d
stream : fix data race on bool + avoid division-by-zero
2 years ago
Andy Maloney
dc90efd504
examples : small code cleanups ( #322 )
...
- remove unnecessary initialization of string to ""
- use empty() instead of checking size()
- use emplace_back instead of push_back
- use nullptr instead of NULL
- remove unnecessary call to .data() on string
- use character overload of find_first_of() instead of passing a string
2 years ago
Georgi Gerganov
99da1e5cc8
cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings
2 years ago
Georgi Gerganov
a82d331034
stream : update README.md + comments
2 years ago
Georgi Gerganov
34e0b4b9ef
stream : fix build
2 years ago
Georgi Gerganov
b0f8013eb9
stream : add sliding window mode
2 years ago
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing ( close #86 )
2 years ago
Georgi Gerganov
b8ce25dec1
refactoring : more readable code
2 years ago
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag ( #178 )
2 years ago
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2 years ago
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2 years ago
Georgi Gerganov
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2 years ago
Georgi Gerganov
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2 years ago
Georgi Gerganov
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2 years ago
Georgi Gerganov
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2 years ago
Georgi Gerganov
c058aaf22e
stream : partial encoder experiments
2 years ago
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2 years ago
Georgi Gerganov
5a9e4260a6
stream : add "--capture" option to select capture device (ref #10 )
2 years ago
Georgi Gerganov
8347a7bb6a
stream : few updates to make it compatible for Vim usage ( #99 )
2 years ago
Georgi Gerganov
c6710efde2
refactoring : move main + stream in examples + other stuff
2 years ago