Georgi Gerganov
575c53dc41
yt-wsp.sh : fix usage instruction + comment
2 years ago
Georgi Gerganov
faa85f9840
livestream.sh : remove obsolete comment
2 years ago
Georgi Gerganov
9fe7306f4b
models : add the new "large" model release by OpenAI
...
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2 years ago
Georgi Gerganov
57e0e6b700
livestream : handle ffmpeg errors gracefully and stabilize transcript
2 years ago
Georgi Gerganov
4f7363077f
livestream : minor changes
2 years ago
semiformal-net
093c840dee
livestream : fix losing words across audio chunk ( #195 )
...
* improve livestream script
* Update examples/livestream.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Paul Edwards <paul.edwards@semiformal.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2 years ago
Georgi Gerganov
4698dcdb52
whisper : add mechanism for aborting the whisper_full() computation
2 years ago
Georgi Gerganov
164df0d447
whisper.objc : fix context + broken readme links
2 years ago
Georgi Gerganov
e266cb0723
whisper.objc : add real-time processing ( #97 )
...
Similar to the "stream" app
2 years ago
Georgi Gerganov
c207eed431
whisper.objc : fix build warnings
2 years ago
Georgi Gerganov
a425365b82
yt-wsp.sh : script to easily transcribe VODs
...
Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818
Usage:
cd whisper.cpp
make
./examples/yt-wsp.sh <video-url>
2 years ago
Georgi Gerganov
68ecadbbc9
command.wasm : add voice assistant example for the Web ( #171 )
...
Same as the command-line tool "command", but runs in the browser
Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
2 years ago
Georgi Gerganov
c536ff4005
minor : add comment for using "generate_karaoke.sh"
2 years ago
Georgi Gerganov
cb70b07db5
livestream.sh : simple tool to transcribe audio livestreams ( #185 )
2 years ago
Georgi Gerganov
3c390ffe38
stream.wasm : add web-based real-time transcription ( #112 )
2 years ago
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing ( close #86 )
2 years ago
Georgi Gerganov
0f619b52ce
main : add stereo-channel-based diarization ( #64 )
...
Not tested - I don't have stereo dialog audio
2 years ago
Georgi Gerganov
1246dd023e
command : add demonstration video
2 years ago
Georgi Gerganov
0be27bbd92
command : fix build + fix README + add bold printing
2 years ago
Georgi Gerganov
bc88eb13c6
examples : add "command" tool ( #171 )
2 years ago
Georgi Gerganov
b8ce25dec1
refactoring : more readable code
2 years ago
Georgi Gerganov
e4805d9601
wasm : refactor wasm example + reuse fetch mechanism
2 years ago
Georgi Gerganov
ff36415a86
talk.wasm : update video link + some minor fixes
2 years ago
Georgi Gerganov
025ff465b6
Update README.md
...
Use a less cringy video to demo talk.wasm lol
2 years ago
Georgi Gerganov
abce28ea99
talk.wasm : move to https://whisper.ggerganov.com/talk
...
This way, we can share the same models across different WASM examples
and not have to download them for each page
2 years ago
Georgi Gerganov
454b91de16
main : fix dangling pointer when using stdin for input ( #65 )
2 years ago
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag ( #178 )
2 years ago
Georgi Gerganov
37422ed733
talk.wasm : add audio pre-processing + bump memory
2 years ago
Georgi Gerganov
be3b720f96
talk.wasm : refactoring + update README.md
2 years ago
Georgi Gerganov
49706a658a
minor : updates few prints + fix buttons in whisper.wasm
2 years ago
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo)
2 years ago
Georgi Gerganov
dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm)
2 years ago
Georgi Gerganov
326573de9a
talk.wasm : final touches
2 years ago
Georgi Gerganov
9aea96f774
talk.wasm : polishing + adding many AI personalities
2 years ago
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2 years ago
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2 years ago
Georgi Gerganov
78116f8eda
talk.wasm : update README.md
2 years ago
Georgi Gerganov
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly ( #155 )
...
* talk : initial real-time transcription in the browser
* talk : polishing the UI
* talk : ready for beta testing
* talk.wasm : rename example
2 years ago
Georgi Gerganov
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2 years ago
Georgi Gerganov
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2 years ago
Georgi Gerganov
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2 years ago
Georgi Gerganov
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2 years ago
Georgi Gerganov
c058aaf22e
stream : partial encoder experiments
2 years ago
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2 years ago
Alan
7519eabf65
Adds support for stdin wav input
2 years ago
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2 years ago
Georgi Gerganov
c71363f14c
examples : add simple script for generating Karaoke video
2 years ago
Georgi Gerganov
d42cf6d0df
Update README.md
2 years ago
Georgi Gerganov
ef47d77492
main : fix generated bash script
2 years ago
Georgi Gerganov
d5afebd37c
whisper : token-level timestamp refactoring ( #49 , #120 )
...
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2 years ago
Georgi Gerganov
6fb98370ba
main : add some comments for the word-level timestamp algorithm
2 years ago
Georgi Gerganov
0729da9a3b
main : fix some edge cases for word-level timestamps
2 years ago
Georgi Gerganov
5dc74e3aff
Update README.md
2 years ago
Georgi Gerganov
ac8ef34039
Update README.md
2 years ago
Georgi Gerganov
dc12994603
Update README.md
2 years ago
Georgi Gerganov
57fb46f307
main : add option for word-leve timestamps (very experimental)
2 years ago
Georgi Gerganov
5a9e4260a6
stream : add "--capture" option to select capture device (ref #10 )
2 years ago
Georgi Gerganov
12fb303d9d
whisper.wasm : update system info print
2 years ago
Georgi Gerganov
2827cbbbe8
main : merge parallel example in main
2 years ago
Georgi Gerganov
0b2dc3c82c
parallel : working
2 years ago
Georgi Gerganov
85d6e1e1e7
main : fix sampling time + add max_context parameter
2 years ago
Georgi Gerganov
72e9cdd6bf
parallel : adding tool for parallel transformer inference
2 years ago
Georgi Gerganov
b89f8960ca
Update README.md
2 years ago
Georgi Gerganov
6f82320b05
Create README.md
2 years ago
Georgi Gerganov
2298310dd8
whisper.nvim : add helper script for the Neovim integration
2 years ago
Georgi Gerganov
8347a7bb6a
stream : few updates to make it compatible for Vim usage ( #99 )
2 years ago
Georgi Gerganov
ebb01b9e33
Print system info at start of program
2 years ago
Georgi Gerganov
2400660f3f
Print system info in main
2 years ago
Georgi Gerganov
a6c786d5dc
Update README.md
2 years ago
Georgi Gerganov
91dcf5f35b
Update README.md
2 years ago
Georgi Gerganov
113a4f06d8
Update README.md
2 years ago
Georgi Gerganov
47e78b7288
Update README.md
2 years ago
Georgi Gerganov
34bb3ab0cf
ggml : add system info functions
2 years ago
Georgi Gerganov
c6710efde2
refactoring : move main + stream in examples + other stuff
2 years ago
Georgi Gerganov
d4f94ce427
Update README.md
2 years ago
Georgi Gerganov
a52ee08c1e
objc : polishing the sample application
2 years ago
Georgi Gerganov
b41f4a90eb
Create README.md
2 years ago
Georgi Gerganov
bb1ee266d2
ios : whisper.objc example
2 years ago
Georgi Gerganov
3e69a6071d
Update README.md
2 years ago
Georgi Gerganov
f4aa01c2f8
Update README.md
2 years ago
Georgi Gerganov
6b45e37b2b
Update README.md and finalize the whisper.wasm example
2 years ago
Georgi Gerganov
491ecd7056
wip : polishing WASM example
2 years ago
Georgi Gerganov
e905c6f827
wip : initial WASM port
...
Works but it is very slow because no SIMD is used.
For example, jfk.wav is processed in ~23 seconds using "tiny.en" model
2 years ago