Also, this works in the begging of the file, but in the middle I get repeated tokens sometimes and time timestamp keeps on advancing:
[00:22:16.000 --> 00:22:18.000] Because you really need all the powers.
[00:22:18.000 --> 00:22:27.000] [Indiscernible]
[00:22:27.000 --> 00:22:28.000] Take off 1-8 center.
[00:22:28.000 --> 00:22:40.000] [Indiscernible]
[00:22:40.000 --> 00:22:48.000] [Indiscernible]
[00:22:48.000 --> 00:22:49.000] 80 knots.
[00:22:49.000 --> 00:22:53.000] Yes.
[00:22:53.000 --> 00:22:54.000] V1.
[00:22:54.000 --> 00:22:55.000] V1.
[00:22:55.000 --> 00:22:56.000] V1.
[00:22:56.000 --> 00:22:57.000] V1.
[00:22:57.000 --> 00:22:58.000] V1.
[00:22:58.000 --> 00:22:59.000] V1.
[00:22:59.000 --> 00:23:00.000] V1.
etc stuck at V1
* npm : preparing infra for node package
* npm : package infra ready
* npm : initial version ready
* npm : change name to whisper.cpp
whisper.js is taken
Also added a small wrapper function to more safely read model data without having to get the sizeof right. I tested this on tiny, base and large models, there was no change in behaviour.
Do not allow for text segments to go beyond end of audio.
This partially mitigates some issues when the last audio window is 1-2
seconds just before the end of the audio file and the decoding spirals
into a repetition of the last transcribed phrase.
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
* whisper : try to improve the token sampling strategy
- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past
* whisper : fix the max initial timestamp logic + fallback decoding