whisper.cpp

Commit Graph

Author	SHA1	Message	Date
keyehzy	9e5f3ddc16	Allow for Twitch.tv live transcription We rely on streamlink library to give us a stream, then we proceed similarly to the radio livestream example.	3 years ago
Georgi Gerganov	47afb93c3c	yt-wsp.sh : improve usage instructions	3 years ago
Georgi Gerganov	575c53dc41	yt-wsp.sh : fix usage instruction + comment	3 years ago
Georgi Gerganov	faa85f9840	livestream.sh : remove obsolete comment	3 years ago
Georgi Gerganov	9fe7306f4b	models : add the new "large" model release by OpenAI The old "large" model is now renamed "large-v1". If you have been using it, make sure to rename it and download the new "large" model for best results.	3 years ago
Georgi Gerganov	57e0e6b700	livestream : handle ffmpeg errors gracefully and stabilize transcript	3 years ago
Georgi Gerganov	4f7363077f	livestream : minor changes	3 years ago
semiformal-net	093c840dee	livestream : fix losing words across audio chunk (#195 ) * improve livestream script * Update examples/livestream.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Paul Edwards <paul.edwards@semiformal.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	3 years ago
Georgi Gerganov	4698dcdb52	whisper : add mechanism for aborting the whisper_full() computation	3 years ago
Georgi Gerganov	164df0d447	whisper.objc : fix context + broken readme links	3 years ago
Georgi Gerganov	e266cb0723	whisper.objc : add real-time processing (#97 ) Similar to the "stream" app	3 years ago
Georgi Gerganov	c207eed431	whisper.objc : fix build warnings	3 years ago
Georgi Gerganov	a425365b82	yt-wsp.sh : script to easily transcribe VODs Thanks to @DaniruKun ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818 Usage: cd whisper.cpp make ./examples/yt-wsp.sh <video-url>	3 years ago
Georgi Gerganov	68ecadbbc9	command.wasm : add voice assistant example for the Web (#171 ) Same as the command-line tool "command", but runs in the browser Also, added helper script "extra/deploy-wasm.sh" and fixed some timing constants for the WASM examples.	3 years ago
Georgi Gerganov	c536ff4005	minor : add comment for using "generate_karaoke.sh"	3 years ago
Georgi Gerganov	cb70b07db5	livestream.sh : simple tool to transcribe audio livestreams (#185 )	3 years ago
Georgi Gerganov	3c390ffe38	stream.wasm : add web-based real-time transcription (#112 )	3 years ago
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	3 years ago
Georgi Gerganov	0f619b52ce	main : add stereo-channel-based diarization (#64 ) Not tested - I don't have stereo dialog audio	3 years ago
Georgi Gerganov	1246dd023e	command : add demonstration video	3 years ago
Georgi Gerganov	0be27bbd92	command : fix build + fix README + add bold printing	3 years ago
Georgi Gerganov	bc88eb13c6	examples : add "command" tool (#171 )	3 years ago
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	3 years ago
Georgi Gerganov	e4805d9601	wasm : refactor wasm example + reuse fetch mechanism	3 years ago
Georgi Gerganov	ff36415a86	talk.wasm : update video link + some minor fixes	3 years ago
Georgi Gerganov	025ff465b6	Update README.md Use a less cringy video to demo talk.wasm lol	3 years ago
Georgi Gerganov	abce28ea99	talk.wasm : move to https://whisper.ggerganov.com/talk This way, we can share the same models across different WASM examples and not have to download them for each page	3 years ago
Georgi Gerganov	454b91de16	main : fix dangling pointer when using stdin for input (#65 )	3 years ago
Georgi Gerganov	d7024cf9dc	main, stream : remove --verbose flag (#178 )	3 years ago
Georgi Gerganov	37422ed733	talk.wasm : add audio pre-processing + bump memory	3 years ago
Georgi Gerganov	be3b720f96	talk.wasm : refactoring + update README.md	3 years ago
Georgi Gerganov	49706a658a	minor : updates few prints + fix buttons in whisper.wasm	3 years ago
Georgi Gerganov	e5dcdabbb8	unicode : fix character replacement (thanks to @tamo)	3 years ago
Georgi Gerganov	dad109c3f1	close #109 : add fetching of the model over HTTP (whisper.wasm)	3 years ago
Georgi Gerganov	326573de9a	talk.wasm : final touches	3 years ago
Georgi Gerganov	9aea96f774	talk.wasm : polishing + adding many AI personalities	3 years ago
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	3 years ago
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	3 years ago
Georgi Gerganov	78116f8eda	talk.wasm : update README.md	3 years ago
Georgi Gerganov	a4dfbeecf9	talk.wasm : GPT-2 meets Whisper in WebAssembly (#155 ) * talk : initial real-time transcription in the browser * talk : polishing the UI * talk : ready for beta testing * talk.wasm : rename example	3 years ago
Georgi Gerganov	f2df9bd768	stream : add "max_tokens" cli arg Controls the max tokens per segment for the stream example	3 years ago
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	3 years ago
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	3 years ago
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	3 years ago
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	3 years ago
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	3 years ago
Alan	7519eabf65	Adds support for stdin wav input	3 years ago
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	3 years ago
Georgi Gerganov	c71363f14c	examples : add simple script for generating Karaoke video	3 years ago
Georgi Gerganov	d42cf6d0df	Update README.md	3 years ago

1 2

85 Commits (9e5f3ddc166ab9354abe12498ef54bb49a30bbe6)