aistrategy

The Future of Total Transcription

Total Transcription is an idea that leverages OpenAI Whisper speech-to-text technology to transcribe all of your podcast audio for use in your own, personal ML-modeled library.

By Kyle M. Bondo

Once upon a time, transcription used to be a luxury that podcasters would ignore.

The crafting of converting audio into the written word, while a desired accessibility feature, used to be both a difficult and expensive task. If you wanted a transcript for your podcast, the easiest way was to rely on your own good old-fashioned human brain power.

You, as the podcaster, would simply listen to your own audio and then write down what you heard by hand. This method was accurate but extremely time-intensive.

To solve the time problem of creating a transcript, companies emerged that specialized in transcription services. These companies would (for a small fee) transcribe your audio for you and then return you a finished transcription in a few hours. All you would have to do is upload your audio, go get a cup of coffee, and wait.

Nice!

Unfortunately, while convenient, the small fee for a minute of audio could add up quickly. Soon, your transcription costs were greater than your hosting costs making this option expensive to everyone but large studios. Additionally, a new problem emerged with using these companies. Because labor was cheaper overseas, many of these transcribers would use non-native English speakers.

The end result would be transcripts that included strange grammar errors, misspellings, and typos. Heaven's forbid you had some speaking on your podcast with a Texan accent!

Enter the software developers! They would solve all these problems with code or so they claimed. The early software-driven attempts at transcription automation resulted in mediocre results. Yes, they replaced the strange grammar errors, misspellings, and typos the non-native English speakers would make, but in exchange, they created a whole new list of strange grammar errors, misspellings, and typos.

In some instances, the software was worse than the non-native English speakers and twice as expensive. Ultimately, a hybrid approach evolved that first allowed a human to correct an automated transcript, then produced tools that would allow you as the podcaster to correct the automation yourself. While useful, this too was time-consuming and came with a price.

Then came OpenAI Whisper. Like most innovations, OpenAI Whisper, a machine learning model for speech recognition and transcription created in 2022, benefited from the previous iterations to solve the speech-to-text problem.

However, when it came to OpenAI Whisper, it could do more than accurately transcribe speech-to-text with minimal errors in both spelling and grammar. It could transcribe English language audio into several other languages and transcribe several non-English languages into English. Additionally, it could do all of this within seconds for pennies.

Overnight, OpenAI Whisper single-handedly destroyed the expensive, white-glove, transcription service industry. It was a market disruption on the same scale as the car replacing the horse. Yet, for podcasters, it was the dawn of a new capability yet to be tapped: Audio Context Search!

For years, podcast audio has been a black box; billions of hours of audio locked away in millions of podcast episodes. Without a good episode title, description, or even a webpage, podcast audio has been locked away for years. That was until OpenAI Whisper, and technologies like it, cracked the door open to a treasure trove of forgotten content.

You see, the idea goes something like this: You have a podcast that has NEVER been transcribed. You get access to an OpenAI Whisper tool. You point your OpenAI Whisper tool at ALL of your audio episodes. It accurately transcribes ALL of your audio episodes. You then point a machine learning (ML) model (which everyone is calling artificial intelligence these days even if it's not) at all of your transcripts to index each and every word. Boom!

A complete personal ML-modeled library of your podcast for the price of a cup of coffee!

Now, think about the possibilities of this concept for a moment. Once you've completed this transcription task and produced your own personal ML model, you have in your possession a treasure map to your own show!

With total transcription, you have just cataloged every word you ever said, every word your guests have ever said, about every topic you've ever talked about on your podcast. You can now ask YOUR PODCAST questions. Questions such as, "What have I talked about the most?" or "What topic related to my show have I not talked about?". This total transcription power can also be leveraged to ask your ML model questions you might not be ready for, such as, "What should I talk about next?".

It could even lead to questions that you have yet to think up.

This is the future of podcasting: Total Transcription using OpenAI Whisper transcripts to inform and improve your own personal ML model. Imagine what podcasting will look like in five years when every podcaster has this capability. You will be able to find asynchronous episodes about a single topic that spans not only multiple podcasts you've NEVER heard of but if we reach a point of total transcription, you will be able to find those episodes without the limitation of time.

In other words, you could discover voices and thoughts within any audio that was recorded since the beginning of podcasting itself.

Imagine what you could learn with all that knowledge.

🤠☕

Oncetold is a podcast education and technology company. We turn new podcasters into yarn weavers, big dreamers, and true believers. Start telling your story at oncetold.us.

Find more podcasting wisdom on Not Easily Squished or at noteasilysquished.com.

Made with favorite by Oncetold