{"id":10750,"date":"2023-09-29T18:00:00","date_gmt":"2023-09-29T18:00:00","guid":{"rendered":"https:\/\/businessyield.com\/tech\/?p=10750"},"modified":"2023-09-27T20:33:55","modified_gmt":"2023-09-27T20:33:55","slug":"openai-whisper-how-does-openai-whisper-work","status":"publish","type":"post","link":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/","title":{"rendered":"OpenAI Whisper: How Does OpenAI Whisper Work?","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"\n<p>OpenAI recently launched the Whisper API, a hosted version of the open-source Whisper speech-to-text model to coincide with the release of ChatGPT API. <\/p>\n\n\n\n<p>Priced at $0.006 per minute, Whisper is an automatic speech recognition system that OpenAI claims enables \u201crobust\u201d transcription in multiple languages as well as translation from those languages into English. It takes files in a variety of formats, including M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM.<\/p>\n\n\n\n<p>Countless organizations have developed highly capable speech recognition systems, which sit at the core of software and services from tech giants like Google, Amazon, and Meta. However, what sets Whisper apart is that it was trained on 680,000 hours of multilingual and \u201cmultitask\u201d data collected from the web. <\/p>\n\n\n\n<p>This leads to improved recognition of unique accents, background noise, and technical jargon.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-overview-of-openai-whisper\"><span id=\"overview-of-openai-whisper\"><strong>Overview of OpenAI Whisper<\/strong><\/span><\/h2>\n\n\n\n<p>Whisper is an automatic speech recognition model trained on 680,000 hours of multilingual data collected from the web. As per OpenAI, this model is robust to accents, background noise, and technical language. In addition, it supports 99 different languages\u2019 transcription and translation from those languages into English.<\/p>\n\n\n\n<p id=\"d65d\">Whisper has five models (refer to the below table). Below is the table available on OpenAI\u2019s GitHub page. According to OpenAI, there are four models for English-only applications, which are denoted as&nbsp;<code>.en<\/code>. The model performs better for&nbsp;<code>tiny.en<\/code>&nbsp;and&nbsp;<code>base.en<\/code>, but differences become less significant for the&nbsp;<code>small.en<\/code>&nbsp;and&nbsp;<code>medium.en<\/code>&nbsp;models.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"337\" src=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=1200%2C337&#038;ssl=1\" alt=\"\" class=\"wp-image-11160\" srcset=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?w=1400&amp;ssl=1 1400w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=300%2C84&amp;ssl=1 300w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=1024%2C287&amp;ssl=1 1024w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=768%2C216&amp;ssl=1 768w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=380%2C107&amp;ssl=1 380w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=800%2C225&amp;ssl=1 800w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-30.png?resize=1160%2C326&amp;ssl=1 1160w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption class=\"wp-element-caption\">Ref:<a href=\"https:\/\/github.com\/openai\/whisper\" rel=\"noreferrer noopener\" target=\"_blank\">&nbsp;OpenAI\u2019s GitHHub Page<\/a><\/figcaption><\/figure>\n\n\n\n<p>The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper is an Encoder-Decoder model, trained on 680,000 hours of multilingual and multitask supervised data collected from the web. <\/p>\n\n\n\n<p>Transcription is a process of converting spoken language into text. In the past, it was done manually; now, there are AI-powered tools like Whisper that can accurately understand spoken language. With a basic knowledge of Python language, you can integrate OpenAI Whisper API into your application. <\/p>\n\n\n\n<p>The Whisper API is a part of openai\/openai-python, which allows you to access various OpenAI services and models.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-are-good-use-cases-for-transcription?\"><span id=\"what-are-good-use-cases-for-transcription\"><strong>What are good use cases for transcription?<\/strong><\/span><\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Transcribing interviews, meetings, lectures, and podcasts for analysis, easy access, and keeping records.&nbsp;<\/li>\n\n\n\n<li>Real-time speech transcription for subtitles (YouTube), captioning (Zoom meetings), and translation of spoken language.<\/li>\n\n\n\n<li>Speech transcription for personal and professional use. Transcribing voice notes, messages, reminders, memos, and feedback.<\/li>\n\n\n\n<li>Transcription for people with hearing impairments.<\/li>\n\n\n\n<li>Transcription for voice-based applications that require text input. For example, chatbot, voice assistant, and language translation.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-does-whisper-work\"><span id=\"how-does-whisper-work\"><strong>How does Whisper work?<\/strong><\/span><\/h2>\n\n\n\n<p>Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.<\/p>\n\n\n\n<p>In simpler words, OpenAI Whisper is built on the transformer architecture, stacking encoder blocks and decoder blocks with the attention mechanism propagating information between both.<\/p>\n\n\n\n<p>It will take the audio recording, split it into 30-second chunks and process them one by one. For each 30-second recording, it will encode the audio using the encoder section and save the position of each word said. Then, it will leverage this encoded information to find what was said using the decoder.<\/p>\n\n\n\n<p>The decoder will predict what we call tokens from all this information, which is basically each word that is said. Then, it will repeat this process for the next word using all the same information as well as the predicted previous word, helping it guess the next one that would make more sense.<\/p>\n\n\n\n<p>OpenAI trained Whisper&#8217;s audio model in a similar way as GPT-3 &#8211; with data available on the internet. This makes it a large and general audio model. It also makes the model way more robust than others. In fact, according to OpenAI, Whisper approaches human-level robustness due to being trained on such a diverse set of data ranging from clips, TED talks, podcasts, interviews, and more. <\/p>\n\n\n\n<p>All of these represent real-world-like data, with some of them transcribed using machine learning-based models and not humans.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-use-openai-whisper\"><span id=\"how-to-use-openai-whisper\"><strong>How to use OpenAI Whisper<\/strong><\/span><\/h2>\n\n\n\n<p>The speech-to-text API provides two endpoints &#8211; transcriptions and translations &#8211;&nbsp;based on OpenAI&#8217;s state-of-the-art open-source large-v2&nbsp;Whisper model. They can be used to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Transcribe audio into whatever language the audio is in.<\/li>\n\n\n\n<li>Translate and transcribe the audio into English.<\/li>\n<\/ul>\n\n\n\n<p>File uploads are currently limited to 25 MB and the following input file types are supported:&nbsp;<code>mp3<\/code>,&nbsp;<code>mp4<\/code>,&nbsp;<code>mpeg<\/code>,&nbsp;<code>mpga<\/code>,&nbsp;<code>m4a<\/code>,&nbsp;<code>wav<\/code>, and&nbsp;<code>webm<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-quickstart\"><span id=\"quickstart\"><a href=\"https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/quickstart\"><\/a><strong>Quickstart <\/strong><\/span><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-transcriptions\"><span id=\"transcriptions\"><strong>Transcriptions<\/strong><\/span><\/h4>\n\n\n\n<p>The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. It currently supports multiple input and output file formats.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"182\" src=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44-1024x182.png?resize=1024%2C182&#038;ssl=1\" alt=\"\" class=\"wp-image-10768\" srcset=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?resize=1024%2C182&amp;ssl=1 1024w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?resize=300%2C53&amp;ssl=1 300w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?resize=768%2C136&amp;ssl=1 768w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?resize=380%2C67&amp;ssl=1 380w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?resize=800%2C142&amp;ssl=1 800w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?resize=1160%2C206&amp;ssl=1 1160w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.09.44.png?w=1512&amp;ssl=1 1512w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>By default, the response type will be json with the raw text included.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{ \n  \"text\": \"Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. \n.... \n}<\/pre>\n\n\n\n<p>To set additional parameters in a request, you can add more&nbsp;<code>--form<\/code>&nbsp;lines with the relevant options. For example, if you want to set the output format as text, you would add the following line:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1 ...\n2 --form file=@openai.mp3 \\\n3 --form model=whisper-1 \\\n4 --form response_format=text<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-translations\"><span id=\"translations\"><a href=\"https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/translations\"><\/a><strong>Translations<\/strong><\/span><\/h4>\n\n\n\n<p>The translations API takes as input the audio file in any of the supported languages and transcribes, if necessary, the audio into English. This differs from OpennAI&#8217;s \/Transcriptions endpoint since the output is not in the original input language and is instead translated into English text.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"185\" src=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38-1024x185.png?resize=1024%2C185&#038;ssl=1\" alt=\"\" class=\"wp-image-10772\" srcset=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?resize=1024%2C185&amp;ssl=1 1024w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?resize=300%2C54&amp;ssl=1 300w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?resize=768%2C139&amp;ssl=1 768w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?resize=380%2C69&amp;ssl=1 380w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?resize=800%2C144&amp;ssl=1 800w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?resize=1160%2C209&amp;ssl=1 1160w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Screenshot-2023-09-26-at-21.16.38.png?w=1518&amp;ssl=1 1518w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>In this case, the inputted audio was German and the outputted text looks like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Hello, my name is Wolfgang and I come from Germany. Where are you heading today?<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-features-of-openai-whisper\"><span id=\"features-of-openai-whisper\"><strong>Features of OpenAI Whisper<\/strong><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"which-languages-are-supported?\"><span id=\"languages\"><strong>Languages<\/strong><\/span><\/h3>\n\n\n\n<p>OpenAI Whisper API supports the following languages for transcriptions and translations:<\/p>\n\n\n\n<p>Afrikaans. Arabic. Armenian. Azerbaijani. Belarusian. Bosnian. Bulgarian. Catalan. Chinese. Croatian. Czech. Danish. Dutch. English. Estonian. Finnish. French. Galician. German. Greek. Hebrew. Hindi. Hungarian. Icelandic. Indonesian. Italian. Japanese. Kannada. Kazakh. Korean. Latvian. Lithuanian. Macedonian. Malay. Marathi. Maori. Nepali. Norwegian. Persian. Polish. Portuguese. Romanian. Russian. Serbian. Slovak. Slovenian. Spanish. Swahili. Swedish. Tagalog. Tamil. Thai. Turkish. Ukrainian. Urdu. Vietnamese. Welsh.<\/p>\n\n\n\n<p>The breakdown of the Word Error Rate (WER) for Fleur&#8217;s dataset using the large-v2 model is presented in the figure below, categorized by languages. The smaller the WER, the better the transcription accuracy.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"871\" height=\"1168\" src=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?resize=871%2C1168&#038;ssl=1\" alt=\"Language ranking from OpenAI\" class=\"wp-image-11159\" srcset=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?w=871&amp;ssl=1 871w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?resize=224%2C300&amp;ssl=1 224w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?resize=764%2C1024&amp;ssl=1 764w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?resize=768%2C1030&amp;ssl=1 768w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?resize=380%2C510&amp;ssl=1 380w, https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/image-29.png?resize=800%2C1073&amp;ssl=1 800w\" sizes=\"auto, (max-width: 871px) 100vw, 871px\" \/><\/figure><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"which-file-formats-are-supported?\"><span id=\"file-formats\"><strong>File formats<\/strong><\/span><\/h3>\n\n\n\n<p>Whisper API supports the following file formats:&nbsp;<code>mp3<\/code>,&nbsp;<code>mp4<\/code>,&nbsp;<code>mpeg<\/code>,&nbsp;<code>mpga<\/code>,&nbsp;<code>m4a<\/code>,&nbsp;<code>wav<\/code>, and&nbsp;<code>webm<\/code>. Currently, upload file size is limited to 25MB. If you have larger files, you can break them down into smaller chunks using&nbsp;<a href=\"https:\/\/github.com\/jiaaro\/pydub\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">pydub<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"user-content-command-line-usage\"><span id=\"command-line-usage\"><strong>Command-line usage<\/strong><\/span><\/h3>\n\n\n\n<p>The following command will transcribe speech in audio files, using the&nbsp;<code>medium<\/code>&nbsp;model:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>whisper audio.flac audio.mp3 audio.wav --model medium\n<\/code><\/pre>\n\n\n\n<p>The default setting (which selects the&nbsp;<code>small<\/code>&nbsp;model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the&nbsp;<code>--language<\/code>&nbsp;option:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>whisper japanese.wav --language Japanese\n<\/code><\/pre>\n\n\n\n<p>Adding&nbsp;<code>--task translate<\/code>&nbsp;will translate the speech into English:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>whisper japanese.wav --language Japanese --task translate\n<\/code><\/pre>\n\n\n\n<p>Run the following to view all available options:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>whisper --help\n<\/code><\/pre>\n\n\n\n<p>See&nbsp;<a href=\"https:\/\/github.com\/openai\/whisper\/blob\/main\/whisper\/tokenizer.py\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">tokenizer.py<\/a>&nbsp;for the list of all available languages.<a href=\"https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/longer-inputs\"><\/a><\/p>\n\n\n\n<h3 id=\"longer-inputs\" class=\"wp-block-heading\"><strong>Longer inputs<\/strong><\/h3>\n\n\n\n<p>By default, the Whisper API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of 25 MB or less or use a compressed audio format. To get the best performance, we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.<\/p>\n\n\n\n<p>One way to handle this is to use the&nbsp;<a href=\"https:\/\/github.com\/jiaaro\/pydub\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">PyDub open-source Python package<\/a>&nbsp;to split the audio:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1 from pydub import AudioSegment\n2\n3 song = AudioSegment.from_mp3(\"good_morning.mp3\")\n4\n5 # PyDub handles time in milliseconds\n6 ten_minutes = 10 * 60 * 1000\n7\n8 first_10_minutes = song&#091;:ten_minutes]\n9\n10 first_10_minutes.export(\"good_morning_10.mp3\", format=\"mp3\")<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-prompting\"><span id=\"prompting\"><a href=\"https:\/\/platform.openai.com\/docs\/guides\/speech-to-text\/prompting\"><\/a><strong>Prompting<\/strong><\/span><\/h3>\n\n\n\n<p>You can use a&nbsp;prompt&nbsp;to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt, so it will be more likely to use capitalization and punctuation if the prompt does too. <\/p>\n\n\n\n<p>However, the current prompting system is much more limited than other language models and only provides limited control over the generated audio. <\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"h-here-are-some-examples-of-how-prompting-can-help-in-different-scenarios\"><span id=\"here-are-some-examples-of-how-prompting-can-help-in-different-scenarios\"><strong>Here are some examples of how prompting can help in different scenarios:<\/strong><\/span><\/h5>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Prompts can be very helpful for correcting specific words or acronyms that the model often misrecognizes in the audio. For example, the following prompt improves the transcription of the words DALL\u00b7E and GPT-3, which were previously written as &#8220;GDP 3&#8221; and &#8220;DALI&#8221;: &#8220;The transcript is about OpenAI which makes technology like DALL\u00b7E, GPT-3, and ChatGPT with the hope of one day building an AGI system that benefits all of humanity&#8221;<\/li>\n\n\n\n<li>To preserve the context of a file that was split into segments, you can prompt the model with the transcript of the preceding segment. This will make the transcript more accurate, as the model will use the relevant information from the previous audio. The model will only consider the final 224 tokens of the prompt and ignore anything earlier. For multilingual inputs, Whisper uses a custom tokenizer and uses the standard GPT-2 tokenizer for English-only inputs. Both are accessible through the open-source&nbsp;<a href=\"https:\/\/github.com\/openai\/whisper\/blob\/main\/whisper\/tokenizer.py#L361\" target=\"_blank\" rel=\"noreferrer noopener\">Whisper Python package<\/a>.<\/li>\n\n\n\n<li>Sometimes, the model might skip punctuation in the transcript. You can avoid this by using a simple prompt that includes punctuation: &#8220;Hello, welcome to my lecture.&#8221;<\/li>\n\n\n\n<li>The model may also leave out common filler words in the audio. If you want to keep the filler words in your transcript, you can use a prompt that contains them: &#8220;Umm, let me think like, hmm&#8230; Okay, here&#8217;s what I&#8217;m, like, thinking.&#8221;<\/li>\n\n\n\n<li>Some languages can be written in different ways, such as simplified or traditional Chinese. The model might not always use the writing style that you want for your transcript by default. You can improve this by using a prompt in your preferred writing style.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-in-conclusion\"><span id=\"in-conclusion\"><strong>In conclusion<\/strong><\/span><\/h2>\n\n\n\n<p>Having such a general model isn\u2019t very powerful in itself, as it will be beaten at most tasks by smaller and more specific models adapted to the task at hand. But it has other benefits. You can use this kind of pre-trained models and fine-tune them on your task. This means that you will take this powerful model and retrain a part of it, or the entire thing, with your own data. <\/p>\n\n\n\n<p>This technique has been shown to produce much better models than starting training from scratch with your data.<\/p>\n\n\n\n<p>Another benefit is that OpenAI open-sourced its code and everything instead of an API. This means you can use Whisper as a pre-trained foundation architecture to build upon and create more powerful models for yourself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-recommended-articles\"><span id=\"recommended-articles\"><strong>Recommended Articles<\/strong><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/businessyield.com\/tech\/social-media\/how-to-get-my-ai-on-snapchat-easy-step-by-step\/\">How To Get My AI On Snapchat: Easy Step-By-Step<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/ecommerce\/cissp-domains\/\">CISSP Domains: What Are the 8 CISSP Domains?<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/raid-0-vs-raid-1-full-comparison-when-why-to-use\/\">RAID 0 vs. RAID 1: Full Comparison + When &amp; Why to Use<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/how-to-make-a-business-email-easy-to-follow-guide\/\">How To Make A Business Email: Easy To Follow Guide<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/how-does-duckduckgo-make-money\/\">HOW DOES DUCKDUCKGO MAKE MONEY? Explained!<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/ecommerce\/how-does-gopuff-work-the-complete-guide\/\">How Does GoPuff Work: The Complete Guide<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-references\"><span id=\"references\"><strong>References<\/strong><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.louisbouchard.ai\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Louis Bouchard<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/platform.openai.com\/overview\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI Platform<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.datacamp.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Data Camp<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub<\/a><\/li>\n<\/ul>\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"excerpt":{"rendered":"OpenAI recently launched the Whisper API, a hosted version of the open-source Whisper speech-to-text model to coincide with&hellip;\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"author":290,"featured_media":10751,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[35],"tags":[],"class_list":{"0":"post-10750","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>OpenAI Whisper: How Does OpenAI Whisper Work? - Business Yield Technology<\/title>\n<meta name=\"description\" content=\"OpenAI Whisper is an automatic speech recognition system that enables robust transcription in multiple languages.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI Whisper: How Does OpenAI Whisper Work? - Business Yield Technology\" \/>\n<meta property=\"og:description\" content=\"OpenAI Whisper is an automatic speech recognition system that enables robust transcription in multiple languages.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/\" \/>\n<meta property=\"og:site_name\" content=\"Business Yield Technology\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/Jay.Arnis\" \/>\n<meta property=\"article:published_time\" content=\"2023-09-29T18:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"828\" \/>\n\t<meta property=\"og:image:height\" content=\"552\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jimmy Anisulowo\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/forlahjay\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jimmy Anisulowo\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/\",\"url\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/\",\"name\":\"OpenAI Whisper: How Does OpenAI Whisper Work? - Business Yield Technology\",\"isPartOf\":{\"@id\":\"https:\/\/businessyield.com\/tech\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1\",\"datePublished\":\"2023-09-29T18:00:00+00:00\",\"author\":{\"@id\":\"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff\"},\"description\":\"OpenAI Whisper is an automatic speech recognition system that enables robust transcription in multiple languages.\",\"breadcrumb\":{\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1\",\"width\":828,\"height\":552,\"caption\":\"OpenAI Whisper\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/businessyield.com\/tech\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI Whisper: How Does OpenAI Whisper Work?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/businessyield.com\/tech\/#website\",\"url\":\"https:\/\/businessyield.com\/tech\/\",\"name\":\"Business Yield Technology\",\"description\":\"Best Tech Reviews, Apps, Phones, &amp; Gaming\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/businessyield.com\/tech\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff\",\"name\":\"Jimmy Anisulowo\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/businessyield.com\/tech\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g\",\"caption\":\"Jimmy Anisulowo\"},\"description\":\"Jimmy generally lives his life by one dogma: steady improvement. This has taken him on a relentless pursuit of knowledge in diverse fields such as business, tech, insurance, health and many others. With a background in content creation and digital marketing plus over ten years of writing and research experience, he implements an expert's view to help his audiences gain valuable insight. He is also an avid reader, gamer, drummer, full-blown metalhead, and all-round fun gi.\",\"sameAs\":[\"https:\/\/www.facebook.com\/Jay.Arnis\",\"https:\/\/www.instagram.com\/forlahjay\/\",\"https:\/\/x.com\/https:\/\/twitter.com\/forlahjay\"],\"url\":\"https:\/\/businessyield.com\/tech\/author\/jimmy\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI Whisper: How Does OpenAI Whisper Work? - Business Yield Technology","description":"OpenAI Whisper is an automatic speech recognition system that enables robust transcription in multiple languages.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI Whisper: How Does OpenAI Whisper Work? - Business Yield Technology","og_description":"OpenAI Whisper is an automatic speech recognition system that enables robust transcription in multiple languages.","og_url":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/","og_site_name":"Business Yield Technology","article_author":"https:\/\/www.facebook.com\/Jay.Arnis","article_published_time":"2023-09-29T18:00:00+00:00","og_image":[{"width":828,"height":552,"url":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1","type":"image\/jpeg"}],"author":"Jimmy Anisulowo","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/forlahjay","twitter_misc":{"Written by":"Jimmy Anisulowo","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/","url":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/","name":"OpenAI Whisper: How Does OpenAI Whisper Work? - Business Yield Technology","isPartOf":{"@id":"https:\/\/businessyield.com\/tech\/#website"},"primaryImageOfPage":{"@id":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#primaryimage"},"image":{"@id":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1","datePublished":"2023-09-29T18:00:00+00:00","author":{"@id":"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff"},"description":"OpenAI Whisper is an automatic speech recognition system that enables robust transcription in multiple languages.","breadcrumb":{"@id":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#primaryimage","url":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1","contentUrl":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1","width":828,"height":552,"caption":"OpenAI Whisper"},{"@type":"BreadcrumbList","@id":"https:\/\/businessyield.com\/tech\/technology\/openai-whisper-how-does-openai-whisper-work\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/businessyield.com\/tech\/"},{"@type":"ListItem","position":2,"name":"OpenAI Whisper: How Does OpenAI Whisper Work?"}]},{"@type":"WebSite","@id":"https:\/\/businessyield.com\/tech\/#website","url":"https:\/\/businessyield.com\/tech\/","name":"Business Yield Technology","description":"Best Tech Reviews, Apps, Phones, &amp; Gaming","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/businessyield.com\/tech\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff","name":"Jimmy Anisulowo","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/businessyield.com\/tech\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g","caption":"Jimmy Anisulowo"},"description":"Jimmy generally lives his life by one dogma: steady improvement. This has taken him on a relentless pursuit of knowledge in diverse fields such as business, tech, insurance, health and many others. With a background in content creation and digital marketing plus over ten years of writing and research experience, he implements an expert's view to help his audiences gain valuable insight. He is also an avid reader, gamer, drummer, full-blown metalhead, and all-round fun gi.","sameAs":["https:\/\/www.facebook.com\/Jay.Arnis","https:\/\/www.instagram.com\/forlahjay\/","https:\/\/x.com\/https:\/\/twitter.com\/forlahjay"],"url":"https:\/\/businessyield.com\/tech\/author\/jimmy\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/OpenAI-Whisper.jpeg?fit=828%2C552&ssl=1","jetpack_sharing_enabled":true,"gt_translate_keys":[{"key":"link","format":"url"}],"_links":{"self":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts\/10750","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/users\/290"}],"replies":[{"embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/comments?post=10750"}],"version-history":[{"count":2,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts\/10750\/revisions"}],"predecessor-version":[{"id":11161,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts\/10750\/revisions\/11161"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/media\/10751"}],"wp:attachment":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/media?parent=10750"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/categories?post=10750"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/tags?post=10750"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}