Using OpenAI and ElevenLabs APIs to Generate Compelling Voiceover Content: A Step-by-Step Guide

Voice technology has taken the world by storm, enabling businesses and individuals to bring text to life in a whole new way. In this blog post, we’ll walk you through how you can use OpenAI’s language model, GPT-3, in conjunction with ElevenLabs’ Text-to-Speech (TTS) API to generate compelling voiceover content.

Step 1: Setting Up Your Environment

First things first, you’ll need to make sure you have Python installed on your system. You can download it from the official Python website if you don’t have it yet. Once Python is set up, you’ll need to install the necessary libraries.

You can install the ElevenLabs and OpenAI Python libraries using pip:

pip install openai elevenlabs

Now that we have everything set up, let’s get started!

Step 2: Generating Text with OpenAI

We’ll start by using OpenAI’s GPT-3 model to generate some text. Before you can make API calls, you’ll need to sign up on the OpenAI website and get your API key.

Once you have your key, use it to set your API key in your environment:

import openai

openai.api_key = 'your-api-key'

Now you can generate some text using the openai.Completion.create function:

response = openai.Completion.create(
  engine="text-davinci-002",
  prompt="Translate the following English text to French: '{}'",
  max_tokens=60
)

The above code generates translations of English text to French. You can replace the prompt with any text you’d like to generate.

Step 3: Setting Up ElevenLabs API

Now that we have our text, we need to turn it into speech. That’s where ElevenLabs comes in.

Firstly, get your ElevenLabs API key from the ElevenLabs website. Then set up your environment:

from elevenlabs import set_api_key

set_api_key("<your-elevenlabs-api-key>")

Step 4: Adding a New Voice

Before we can generate speech, we need a voice. ElevenLabs allows you to add your own voices. Here’s how you can do it:

from elevenlabs import clone

voice = clone(
    name="Voice Name",
    description="A description of the voice",
    files=["./sample1.mp3", "./sample2.mp3"],
)

This code creates a new voice using the provided MP3 files. Be sure to replace Voice Name with a name for your voice, and A description of the voice with a fitting description.

Step 5: Generating Speech

Now that we have our voice, we can generate some speech:

from elevenlabs import generate

# Retrieve the generated text from the OpenAI's GPT-3 API
generated_text = response.choices[0].text.strip()

# Generate speech from the text using the created voice
audio = generate(text=generated_text, voice=voice)

In this code, generated_text is the text that was generated by OpenAI’s GPT-3 in Step 2. We then use that text to generate speech using the voice we created in Step 4 with ElevenLabs’ API.

And that’s it! You’ve now successfully used OpenAI’s GPT-3 and ElevenLabs’ TTS APIs to generate voiceover content from text created by a language model. You can now use this content in your applications, or just have some fun generating different voices and texts!

This blogpost was created with help from ChatGPT Pro