3 Ways I Used AI-Generated Voice Audio Without Compromising Human Connection

Joe Oberster Learn

3 Ways I Used AI-Generated Voice Audio Without Compromising Human Connection

With the easy accessibility to AI tools today, you have probably noticed that AI-generated audio and video that mimics humans can make you feel a bit...uneasy. Even Disney’s world-sized budget won’t convince me that I’m looking at a young Carrie Fisher at the end of Rogue One: A Star Wars Story.  

That’s because AI has not been able to capture our humanity.  

Our humanity is that connection you feel between two humans communicating. The glint of life in our eyes. The subtle flexes of the 42 individual muscles in our faces. The emotion hidden in a human voice. Until the day comes when we’ve figured out how to translate these hints of humanity into fully generated audio and video production (and that day will certainly come), it’s still a good idea to avoid these types of AI-generated audio and video if your goal is to emotionally connect with the viewer. 

So how did I use AI-generated audio while retaining those hints of humanity? I took advantage of a particular AI service after creating videos with real-life people. Let’s talk about how that happened. Strap in, folks. We’re diving deep. I’ll even throw in a pro tip if you make it all the way through. 

Save Time and Money Using AI-generated Audio in a Video Context

I am human, and like AI, I am not perfect. That is exactly why I found myself relying on AI to help me fix something I messed up. Geoff, our company President had a message to send to the attendants of a small business awards presentation. Our sales team decided, and rightfully so, that video was the best medium to send that message. I set out to create that video message. We wrote a script, picked a location, set up a camera, microphone, lighting and a teleprompter to keep Geoff on script. We recorded the video, I tore down the set and headed into the edit bay. To my absolute horror, I quickly discovered that we had used an earlier draft of the script that was missing an important line from Geoff. 

What was I going to do? Geoff had left the building and was leaving for a family vacation the next day. Even if that weren’t the case, I would have had to spend another 3 hours lugging equipment across the building, setting everything up again, asking our company president to please come back wearing the exact same outfit to read a single sentence. Sure, he would have been happy to do that (he’s pretty easy going), but my small mistake was going to cause a not-so-small hassle. 

An idea struck me. The week prior, I had used an AI-voice-generating software to create voice overs for some smaller videos. These voices were built into the software and were pretty good for what they were, but definitely not great. They sorely lacked that human emotion. However, that software also has a feature to create your own AI-generated voice print from existing audio recordings. I figured it was worth a shot. I exported audio files from Geoff’s video as it was and fed them into the AI voice generator. It wasn’t perfect at first, but with some tweaking, I was happy with the result. After I created an AI-generated audio version of Geoff’s voice, I pasted in the correct script into the generator and brought the result into its place in the project timeline. Since I wasn’t going to be generating an AI video version of Geoff to go along with AI Geoff’s voice, I covered the audio with b-roll and exported the finished product. 

For fun, I decided not to tell anyone what I’d done quite yet. The video was approved, published and viewed by the event attendants. No one inside or outside of TKG said a thing. Success! A single sentence was slipped in and nothing was lost. The viewers of this video were still able to connect with the real Geoff. If you would like to hear the result, you can watch that video here

AI-Generated Audio Band-Aids   

The next use I found for AI-generated voice is as a band-aid, a patcher, almost an audio content-aware-fill of sorts. Not every video production gives you the ability to keep your talent around until their delivery is absolutely perfect. You’ll find yourself coming away from a shoot or interview with hurried sentences jumbled together, mispronounced words and grammatical errors. I came across one such instance recently after shooting a brand video for a client. Rather than a professionally recorded voice over for this video, we decided to use an interview with the owner of the company as the story driver. When we were editing, we realized that an important answer during the interview was hurried through and the client accidentally misspoke using the wrong word. There was no way to reshoot this interview. I always like to make people look and sound their absolute best in the videos we produce, so I decided to try my best to remedy the mistake. 

In our editing software DaVinci Resolve, I exported the audio from the full interview to create an AI voiceprint of the interviewee. I also used AI inside Resolve to quickly generate a transcript of the interview. I copied the full sentence that contained the error, pasted it into the voice generator, corrected the grammar and generated the new sentence. 

Pro Tip: You may wonder why I generated a full sentence instead of just the word that I needed. AI-generated voices can be finicky. Believe it or not, the cadence and intonation of a word or sentence from the software can depend greatly on the context of the words around it, the punctuation and even the length of the sentence. To get something I deemed natural enough to use, I needed the context of at least a full sentence. Until AI-voice generators are as coachable as human voice over artists, this is the way things will be. This is also a major reason why there is still no direct replacement for a seasoned voice over artist and a thoughtful audio producer to deliver your message in the best way possible. 

After some fiddling of settings, the result sounded astonishingly like our interviewee’s voice. I dissected the sentence, transplanted “is” into the actual spoken audio of our client, and a few adjustments later I had a seamless fix to our problem. Just like with Geoff’s video, AI was used not to create a new voice that lacked that human element, but to mimic a real person in a real enough way to help me put our client in the best light possible. See if you can pick out my band-aid during your first watch of that video here.

Taking AI-Generated Audio and Efficiency to the Next Level 

I was flying high after those last two problem solving adventures. What else could I use this amazing AI-audio service for? The next project on my plate was an audio recording for TKG using our Creative Director, Molly, as talent. Our task was to have Molly voice the messages for our automated phone and voicemail system to match our new branding. Yes, folks. Even your automated phone messaging system should match your company brand! Scripts were written, I set up an audio recording booth, we recorded, I edited, and exported audio files for our phone system. Between the two of us, recording and editing took about 4-5 hours of our time. There were extensive reads and re-reads during our session, voice coaching and everything else one would do in your average voice recording session. 

Once this project was finished, just for fun, I used the finished product to create an AI Molly in the AI voice service. Due to the meticulous effort we put into our initial recording in terms of intentional pacing, vocal intonation and cadence, Molly’s AI-generated voiceprint was simply impressive. My mind immediately went into brainstorming mode. How could we take advantage of this discovery? I could use Molly’s AI-generated voice to create internal TKG videos without having to take up her time with script rehearsals and voice recordings. I wouldn’t have to get out our audio recording equipment for smaller TKG tasks that need vocal support. We can create seasonal automated phone messages on a whim, whereas before we would normally update them every 2-3 years. I can generate AI audio for TKG ads, podcast intros or almost anything else in mere minutes, which would normally take hours of time and extensive work. Molly may never have to record anything herself again, because of the effort we put in to capture the parts of Molly’s voice that make her Molly. This both saves Molly and myself time, and expands the possibilities of what we can accomplish we decide to invest in such endeavors. 

What’s Next for AI-Generated Audio for TKG? 

Let me tell you, folks. These three examples got me straight jazzed about AI-generated audio. I’ve got plans. There are a multitude of reasons we might want to use AI Molly audio at TKG, and each could be unique. It still comes with some thoughtful work on the human end. As of now, there is no “emotion drop down menu” in AI voice generators, so it’s important to match what you feed the software with what you want out of the software. To properly move forward with AI Molly taking over for Real Molly, extensive recordings with Real Molly will have to take place. The recordings will vary in pace, tone and excitement. I will build a library of Molly's to fit the variety of situations in which one might need a voice over. These tools at our disposal are groundbreaking, but there is still human legwork that needs to be done to truly retain the human element of voice while we find exciting new ways to fix our audio mistakes and increase efficiency in audio and video production. AI Molly might be the new Voice of TKG, but it has all the humanity that Molly carries because it was created from her own unique personality through experienced and intentional human direction. 

Let TKG Handle Your Next Video or Audio Production! 

I am having a blast finding new and exciting ways to utilize incredible new AI tools to help our clients connect with their audiences. If you’d like to know more about how we can help you and your company with audio and video production, send us a message!