* Resolves some of warnings when compiling with clang/clang++
Mostly nit stuff that clang catches when compiling with -Wall -Wextra
-pedantic.
- Fix comparison between sign/unsigned integers.
- Passes a constant reference (const&) instead of copying each time.
* minor : normalize coding style
* minor : fix warning
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* whisper : try to improve the token sampling strategy
- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past
* whisper : fix the max initial timestamp logic + fallback decoding
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters