keyehzy
9e5f3ddc16
Allow for Twitch.tv live transcription
...
We rely on streamlink library to give us a stream, then we proceed similarly to
the radio livestream example.
2 years ago
Georgi Gerganov
47afb93c3c
yt-wsp.sh : improve usage instructions
2 years ago
Georgi Gerganov
575c53dc41
yt-wsp.sh : fix usage instruction + comment
2 years ago
Georgi Gerganov
faa85f9840
livestream.sh : remove obsolete comment
2 years ago
Georgi Gerganov
9fe7306f4b
models : add the new "large" model release by OpenAI
...
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2 years ago
Georgi Gerganov
57e0e6b700
livestream : handle ffmpeg errors gracefully and stabilize transcript
2 years ago
Georgi Gerganov
4f7363077f
livestream : minor changes
2 years ago
semiformal-net
093c840dee
livestream : fix losing words across audio chunk ( #195 )
...
* improve livestream script
* Update examples/livestream.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Paul Edwards <paul.edwards@semiformal.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2 years ago
Georgi Gerganov
4698dcdb52
whisper : add mechanism for aborting the whisper_full() computation
2 years ago
Georgi Gerganov
164df0d447
whisper.objc : fix context + broken readme links
2 years ago
Georgi Gerganov
e266cb0723
whisper.objc : add real-time processing ( #97 )
...
Similar to the "stream" app
2 years ago
Georgi Gerganov
c207eed431
whisper.objc : fix build warnings
2 years ago
Georgi Gerganov
a425365b82
yt-wsp.sh : script to easily transcribe VODs
...
Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818
Usage:
cd whisper.cpp
make
./examples/yt-wsp.sh <video-url>
2 years ago
Georgi Gerganov
68ecadbbc9
command.wasm : add voice assistant example for the Web ( #171 )
...
Same as the command-line tool "command", but runs in the browser
Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
2 years ago
Georgi Gerganov
c536ff4005
minor : add comment for using "generate_karaoke.sh"
2 years ago
Georgi Gerganov
cb70b07db5
livestream.sh : simple tool to transcribe audio livestreams ( #185 )
2 years ago
Georgi Gerganov
3c390ffe38
stream.wasm : add web-based real-time transcription ( #112 )
2 years ago
Georgi Gerganov
be16dfa038
whisper.wasm : do not block page while processing ( close #86 )
2 years ago
Georgi Gerganov
0f619b52ce
main : add stereo-channel-based diarization ( #64 )
...
Not tested - I don't have stereo dialog audio
2 years ago
Georgi Gerganov
1246dd023e
command : add demonstration video
2 years ago
Georgi Gerganov
0be27bbd92
command : fix build + fix README + add bold printing
2 years ago
Georgi Gerganov
bc88eb13c6
examples : add "command" tool ( #171 )
2 years ago
Georgi Gerganov
b8ce25dec1
refactoring : more readable code
2 years ago
Georgi Gerganov
e4805d9601
wasm : refactor wasm example + reuse fetch mechanism
2 years ago
Georgi Gerganov
ff36415a86
talk.wasm : update video link + some minor fixes
2 years ago
Georgi Gerganov
025ff465b6
Update README.md
...
Use a less cringy video to demo talk.wasm lol
2 years ago
Georgi Gerganov
abce28ea99
talk.wasm : move to https://whisper.ggerganov.com/talk
...
This way, we can share the same models across different WASM examples
and not have to download them for each page
2 years ago
Georgi Gerganov
454b91de16
main : fix dangling pointer when using stdin for input ( #65 )
2 years ago
Georgi Gerganov
d7024cf9dc
main, stream : remove --verbose flag ( #178 )
2 years ago
Georgi Gerganov
37422ed733
talk.wasm : add audio pre-processing + bump memory
2 years ago
Georgi Gerganov
be3b720f96
talk.wasm : refactoring + update README.md
2 years ago
Georgi Gerganov
49706a658a
minor : updates few prints + fix buttons in whisper.wasm
2 years ago
Georgi Gerganov
e5dcdabbb8
unicode : fix character replacement (thanks to @tamo)
2 years ago
Georgi Gerganov
dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm)
2 years ago
Georgi Gerganov
326573de9a
talk.wasm : final touches
2 years ago
Georgi Gerganov
9aea96f774
talk.wasm : polishing + adding many AI personalities
2 years ago
Georgi Gerganov
385236d1d3
stream : "-kc" now enables context keeping from previous segment ( #90 )
...
By default, the context keeping is disabled
2 years ago
M. Eren Akbiyik
63ae03b8e0
Prompt previous tokens for streaming ( #163 )
...
* feat: prompt previous tokens for streaming
I used a vector pointer instead of vector itself because it gave weird errors, and why not
* convert vector to use with C api
* feat: remove old refs, check for prompt size
* feat: use better way of getting the pointer
2 years ago
Georgi Gerganov
78116f8eda
talk.wasm : update README.md
2 years ago
Georgi Gerganov
a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly ( #155 )
...
* talk : initial real-time transcription in the browser
* talk : polishing the UI
* talk : ready for beta testing
* talk.wasm : rename example
2 years ago
Georgi Gerganov
f2df9bd768
stream : add "max_tokens" cli arg
...
Controls the max tokens per segment for the stream example
2 years ago
Georgi Gerganov
fb8d77f760
stream : add "audio_ctx" parameter
...
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.
The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2 years ago
Georgi Gerganov
62b5ff875c
stream : add "max_tokens" parameter
...
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2 years ago
Georgi Gerganov
d351771a4b
stream : add "single_segment" option
...
Force the entire audio chunk to be transcribed into a single segment
2 years ago
Georgi Gerganov
c058aaf22e
stream : partial encoder experiments
2 years ago
Georgi Gerganov
83c742f1a7
whisper : add option to speed up the audio tempo by x2
...
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.
This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.
I think this can find application for real-time transcription - i.e. the
"stream" example.
2 years ago
Alan
7519eabf65
Adds support for stdin wav input
2 years ago
Georgi Gerganov
c30bffc8a5
ref #22 : add "duration" option
...
Can be used to partially process a recording
2 years ago
Georgi Gerganov
c71363f14c
examples : add simple script for generating Karaoke video
2 years ago
Georgi Gerganov
d42cf6d0df
Update README.md
2 years ago