The Podcaster’s Guide to Transcribing Audio

Join the Party Podcast
Bello Collective
Published in
11 min readMar 19, 2018

--

Via #WoCinTech on Flickr

Let’s say you’re watching a movie at home, maybe Fast and the Furious 18. And, halfway through, your roommate decides to make cupcakes for the two of you. So they break out their KitchenAid mixer, and you know from experience that it can get pretty loud, so you turn on subtitles for the movie. Right when Vin Diesel simultaneously drives two Jeep Wranglers down a back alley in Brussels, a message pops up: “Subtitles Not Found.” How would you feel? Would you stop watching? Would you tell your friends not to watch either?

This is how a sizable share of the podcast listenership feels when they cue up a show that lacks transcripts. It forces potential listeners to a judgement call: do I try to make do with the audio and miss out on some or most of the story, or stop listening and find a show that does have a transcript?

We’re here to make sure that this never happens to your show! Here is how and why to create great transcripts.

Who are transcripts for?

More people than you realize.

  • Transcripts are essential for the d/Deaf and Hard of Hearing. Similar to subtitles in a movie, transcripts open up the storytelling, reporting and authenticity of podcasting and radio for those who cannot access audio. Making your show as accessible as possible is the right thing to do. But if you need another incentive to put in the work, consider this point by Miri Josephs in her accessibility presentation at PodCon: “d/Deaf people will listen to something transcribed just because it’s accessible, even if it isn’t something that particularly appeals to them.” Transcripts are a win-win for podcasters interested in growing their audiences — aka all podcasters!
  • Plenty of English language learners or people who know English as a second (or third or sixth) language would love to listen to your English-language podcast. But, they could use some anchors as they do. Reading along as they listen to the audio enables these listeners to access and enjoy your show.
  • The same goes for people who have auditory processing issues. Those who still want to participate it the podcast movement, but need text to help understand what they’re hearing. As listener Katie G, who identifies as listening impaired, wrote to us, “[I]f I miss something that’s said or the sound of something doesn’t translate into something I can identify, I can read it and be like, ‘Oh. Greg just slammed the door behind him. Oh, that poor boy has some anger-management issues.’”
  • Transcripts help a great deal with search engine optimization (SEO). Episode titles and descriptions are hardly enough space to summarize everything you cover in a given episode, but putting the transcript of the show on your website makes your audio searchable. In particular, entrepreneurial podcasts make great use of transcripts and timestamped show notes.
  • Not only do transcripts open up your show to a whole new group of listeners, it is a useful and smart tool to have on hand for the podcast creators. Can’t remember what episode you mentioned those candy-cane striped socks in? Forgot what book you recommended in your last review episode? A quick search of your cache of transcripts gives you the answer in seconds.

Transcripts are not for paying supporters on Patreon. Releasing drafts, script notes, director’s commentaries, or actors’ marked-up sides are all wonderful ways to bring your Patrons behind the scenes on the creation of your show, but transcripts must be free. Accessibility should never depend on a listener’s income.

How You Get It Done

You’re convinced that transcripts are worth doing for your show. Great! Now how do you do it? Whether you have an abundance of manpower, cash flow, or transcription contacts, you have options.

Software

There are more tools than ever for creating transcripts automatically from audio files. Until we reach the singularity where AI conquers our co-host’s Long Island accent, the transcripts that computers create will be imperfect and require human copyediting. Quality improves with fewer speakers and better microphones, so these services work best for shows with one or two speakers and minimal crosstalk. With four players and many character voices, Join the Party episodes transcribed with Trint were about 50% accurate. We still needed to spend 2–3 minutes per minute of audio to copyedit and rewrite the transcripts to a readable level. Trint also came in right in the middle on price, costing around $40 per month (more and less expensive packages are available). Other automated choices include Descript, Sonix and Temi, which cost between $0.07-$0.15 per minute of audio. YouTube is a free option: export your episode as a video, upload it to YouTube as unlisted or private, and use YouTube’s automatic caption service to download a free auto-transcribed version.

Humans

For slightly more money, you can hire people to transcribe your episodes or to edit your automated transcripts. Pairing your software with a human editor will cost whatever the software does, plus $0.50-$0.80 per minute of audio for your transcript editor. It might be less expensive to hire a professional transcriber to create one from scratch, as they generally charge between $1–2 per minute of audio. Transcription companies that employ many transcribers may offer cheaper rates ($0.60-$1.50 per minute) with moderate accuracy and limited or no revisions. Rev, GoTranscript and Scribie are transcription companies; Upwork is a place to hire freelancers; and your local radio email list or Facebook group may have suggestions for individuals to hire.

Do It Yourself

DIY will always be cheapest choice. Transcribing by hand can take anywhere from two to five times as long as your episode is and requires a pretty high level of attention. Some people slow down the episode so they can type as they listen, while others start and stop the episode to type what they just heard. YouTube also offers a pretty decent interface for typing your own transcript, which pauses the video when you’re typing. See if one of your collaborators had a media or journalism background; they may have experience transcribing in the past. While time-consuming, transcribing could also play a double role in your workflow — for example, the editor could hand a rough cut off to one of the hosts, who does a quality check listen while transcribing.

Volunteers

Sometimes fans take matters into their own hands when a podcast they love does not offer transcripts. TAZ Transcribed is a large-scale example of fans coordinating crowd-sourced transcription. Organized by some tenacious mods, fans transcribed 75+ hour-long episodes of The Adventure Zone, a scored and sound-designed improvised fiction podcast. While we applaud these fans’ efforts, it’s still the duty of podcast creators to make transcripts available.

The Style Guide

Once you decide how you will be creating your transcripts, you’ll also want to start a style guide for your show to help your team stay consistent between episodes. This document is where you’ll keep track of how you format the transcript, spell names, structure recurring segments, describe repeat music cues or sound effects, and more. By keeping your formatting consistent, readers will spend time following your story instead of trying to parse a discordant document.

If you write scripts for your show, you should use these as your jumping-off point into a full transcript. You can copy/paste pre-written voiceover into a blank document or adapt a copy of your full script. There’s no need to reformat the entire thing — just make sure the principles from the guide below are present. Be sure to break long speeches into readable paragraphs, distinguish character names from their lines, identify sound effects, describe music, and remove stage directions or notes for actors that aren’t audible in your finished product.

Names

We use the same construction to identify speakers throughout every transcript. We begin with a speaker’s name (in bold text), then a colon, and then the dialogue. When we change speakers or describe a sound or music cue (more on that below), we insert a single line break.

Eric: Let’s roll for initiative. What do y’all got?

Amanda: I got a 17.

Michael: 15!

Brandon: Uh… I rolled a two.

Characters

We also insert a single line break when our players switch characters. For podcasts like ours that have people playing multiple characters, parentheses are a useful way to denote that a player is using a character voice without forcing the transcript reader to have to memorize what actor plays what character.

Amanda: Inara is going to walk up to the barmaid and say,

Amanda (as Inara): So, what’s cookin’, ma’am?

New Characters and Unidentified Voices

When a character is introduced but not yet named, we use descriptors instead of a name. This allows transcript readers to learn information at the same pace as those listening to the audio version of the episode.

Eric: You walk up to a gnome woman.

Amanda (as Inara): Hi there.

Eric (as Gnome Woman): Hello! I’m Rudy!

Michael (as Johnny): Pleasure to meet you.

Eric (as Rudy): What are you doing around these parts?

Nonfiction podcasts should also follow this convention when introducing new voices, describing speakers by their voices…

Roman Mars: I bet you’ve never noticed the landscaping on the New Jersey Turnpike.

Woman with New Jersey accent: No one does!

Roman: Beth Jones does, though.

Beth Jones: I guess it’s what I’ve devoted my life to…

…or by information provided in the narration.

Ira Glass: Something is rotten in the state of Iowa. Just ask the last family farmer in Des Moines.

Farmer: It’s like nothing we’ve ever seen before.

Ira: Meet John, the great-great-grandson of pioneer farmers.

John: We’ve been here since 1820…

Sound Effects & Music

We try to describe sounds in a detailed way, using [brackets] and single line breaks to separate them from dialogue. This helps transcript readers get a sense of the mood we establish with tools like silence, music, and sound effects. Take a look at how we describe music…

Eric: You reach the tavern after 30 minutes of walking.

Brandon: We go in!

[Jaunty, jazz-like tavern music begins playing]

Michael: I go straight to the bar.

…sound effects…

Amanda: I sneak behind the guards while they are distracted.

Eric: Roll for stealth.

[Dice rolls]

Amanda: [Sighs] That’s a five.

…and recognizable songs.

Eric: Welcome aboard the Downeaster Alexa!

Amanda: [Hums “The Downeaster ‘Alexa’” by Billy Joel]

Interruptions

Interruptions happen a lot in an improvised podcast. We usually transcribe these verbatim...

Eric: Wait, all of you can’t just —

Michael: Too late! We are leaving!

Amanda: Yeah! We’ve had enough of —

Brandon: It’s just too much for us to deal with.

…but if things get too confusing or indecipherable, we’ll summarize.

Eric: Who wants to go first?

(Enthusiastic yelling from all players)

Speech Quirks

Sometimes our players stutter or restart sentences. That’s okay! It adds to the charm of the show, and sometimes grows into an inside joke.

Michael: I… I-I-I don’t know what to say

Michael (as Johnny): I cast Presti… Prestidigit… Prestidigitation!

Via Musicoomph.com

Where to Post Your Transcripts

Join the Party and many other shows post transcripts on their website. For every episode, we create a blog post on our website with an embedded player from our podcast hosting company, the podcast description (including credits) and the transcript. We include a link to that post in our episode description, as well as a prominent link in the header of our website. Using a blog style format makes navigation easy for anyone catching up on multiple episodes in one sitting.

If you don’t have a full website for your podcast, you should. In the meantime, you can use a free Tumblr blog. Post your transcripts there and make bit.ly links to each post to include in your episode description (bit.ly/myshow1, bit.ly/myshow2, etc.). When you create a website, you can post your episodes and transcripts there, then change the bit.ly links to point to the right page without breaking.

We don’t recommend including your transcript in your actual episode description. While we’re not aware of any hard limits on episode description length, exceptionally long episode descriptions may break podcatchers at a certain point. Not to mention they’re a lot to download, and podcatchers on small phone screens are not designed for a great deal of reading. Having your transcripts available on the web, not just on a phone, is a must.

Bonus: Audio Mixing for Accessibility

Podcasters ready to go the extra mile can even consider publishing versions of each episode optimized for accessibility.

  • Exporting your episode with mono audio is a great first step. Listeners can use accessibility settings on their phones to mimic this effect, but that can alter your levels from their intended volumes.
  • Removing or re-mixing sound effects (SFX) and music will also improve the intelligibility of your dialogue. While removing them altogether is the quickest solution, you can also side-chain your dialogue to your SFX and music. When someone begins speaking, compress the music and SFX heavily to push the dialogue forward in your mix. Films and movie trailers do this all the time, making background music quieter when characters start speaking.
  • Mixing your episode with a limited dynamic range is another huge help for d/Deaf or Hard of Hearing listeners. A limiter or compressor makes this quicker, but trained ears will be able to tell when a compressor is at work. For best results, re-mix your episode by hand, averaging -18 LUFS on a loudness meter with an LRA of 10 or under.

Make these accessible versions of each episode available in your main feed, in a separate RSS feed that you link to in episode descriptions and your website, or directly embedded on your (mobile-friendly!) website.

Ready to Do This?

Transcripts make your podcast better. They attract new listeners to your show and help existing listeners appreciate the full depth of your story. Remember: accessibility is a right, not a privilege. Investing a little bit of time, money and/or effort is the least we can do to make our art accessible to all podcast fans.

Special thanks to Miri Josephs, Ely Fernández-Collins, Wil Williams, Ma’ayan Plaut, Nicolle Siegart, Mischa Stanton, Brandon Grugle, Katie G., and Michael Fische for their help with this post.

The Bello Collective is a publication + newsletter about podcasts and the audio industry. Our goal is to bring together writers, journalists, and other voices who share a passion for the world of audio storytelling.

Subscribe to the Bello Collective weekly newsletter for more stories, podcast recommendations, audio industry news, and more. Support our work and join our community by becoming a member.

--

--

Join the Party is a collaborative storytelling and roleplaying podcast. That means four friends create a story together, chapter by chapter.