Free Python Text To Speech With Coqui AI (Simple Example)

So you have heard of “paid text to speech AI services”, and wonder if there are free alternatives? The answer is yes. Several of those paid services are actually based on open-source models, just fine-tuned. Let us walk through installing Coqui TTS, one such “free AI text to speech” library in this guide – Read on!

 

 

TABLE OF CONTENTS

 

DOWNLOAD & NOTES

Here is the download link to the example code, so you don’t have to copy-paste everything.

 

EXAMPLE CODE DOWNLOAD

Source code on GitHub Gist

Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.

 

SORRY FOR THE ADS...

But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.

Buy Me A Coffee Code Boxx eBooks

 

 

WORKING WITH COQUI TTS

Brace yourselves, getting Coqui TTS to work involves fighting with digital dragons. There is no “one-click installer”, and it took a senior web developer an entire day to figure some things out. Here is the stuff I did, hope it saves you from searching all around the Internet.

 

PART 1) INSTALLATION

Behold, the dreaded “trial by installation”. At the time of writing, I am using Windows 11. Some things may be different for Linux and Mac users, but here’s what you need:

  • Install Python if you have not already done so.
    • As on the Coqui GitHub Page, Python 3.8 to 3.10 should work fine.
    • If you have an incompatible version of Python, do your own research on “use different Python version with virtualenv”.
  • Install the Microsoft C++ Build Tools.
    • Pick “Visual Studio Community, Desktop Environment For C++”.
    • Not sure if .NET tools are required, but also install those if you want to be safe.
  • Download and install espeak, some text-to-speech models use this.
    • espeak-ng – Scroll down and expand “assets”.
    • Linux users – apt-get install espeak

 

PART 2) PROJECT SETUP

  • Create your own project folder. E.G. D:\COQUI
  • Open the command line, navigate to the project folder – cd D:\COQUI
  • Create a virtual environment to not mess up your other projects.
    • virtualenv venv
    • For Windows – venv\Scripts\activate
    • For Linux/Mac – venv/bin/activate
  • Download Coqui – pip install tts

The end. We are good to go.

 

 

PART 3) RUNNING COQUI TTS

There are 3 ways to run Coqui TTS.

 

3A) IN THE COMMAND LINE

This is the easiest way, just run tts --text "YOUR TEXT" --out_path PATH/SPEECH.WAV in the command line to perform the text-to-speech conversion. There are several other parameters as well, run tts -h to show all of them.

 

3B) IN THE BROWSER

Run tts-server in the command line. It should automatically install and set some things up, eventually show something like Running on http://[::1]:5002. Open your browser and access http://localhost:5002 for the very simple graphical interface… Not very useful, but it works.

 

 

3C) SIMPLE PYTHON SCRIPT

1-demo.py
from TTS.api import TTS
tts = TTS(model_name="tts_models/en/ljspeech/vits", progress_bar=True, gpu=True)
tts.tts_to_file("YES! Text to speech works like magic.", file_path="OUTPUT.wav")

This is pretty much a “rip-off” from the example code on the Coqui GitHub page. Just import the TTS module, instantiate a new object, and call tts_to_file().

 

PART 4) INSTALLATION PAINS

4A) NUMPY & NUMBA

Congratulations if the above worked for you without any issues. But if you get Failed to initialize NumPy: module compiled against API version, it’s a “version mismatch” issue. A pip install --upgrade numpy numba to the latest versions did the magic.

 

4B) GRAPHIC DRIVER PAINS

Take note that the above example is set to gpu=True. Coqui will run without using the GPU, but it’s a lot more efficient with one. Not so sure about the support for AMD graphic cards, but if you run into trouble with Nvidia cards:

  • Get the latest Nvidia driver for your video card.
  • Rebuild your PyTorch.
    • At the time of writing, this will install PyTorch with CUDA support for Windows – pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --force-reinstall.
    • For other OS, please follow up with the PyTorch link above.

 

 

MORE COQUI TTS

Ignore this section if you are happy with the above. This is a small “custom Coqui TTS” script that I wrote, to perform text-to-speech on a text file.

 

TEXT FILE TO NARRATION

INIT COQUI TTS

2-convert.py
# (A) LOAD MODULES
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
import os
 
# (B) SETTINGS
PATH_BASE = os.path.dirname(__file__)
SET_TXT = os.path.join(PATH_BASE, "narrate.txt")
SET_SAVE = os.path.join(PATH_BASE, "output.wav")
SET_MODEL = "tts_models/en/vctk/vits"
SET_SPEAKER = "p274"

Credits to this example I found at Ulife. The first part of the script should be self-explanatory… Load the TTS library, and define the settings.

 

MODEL MANAGER & SYNTHESIZER

2-convert.py
# (C) MODEL MANAGER
manager = ModelManager(
  models_file = PATH_BASE + "/venv/Lib/site-packages/TTS/.models.json",
  output_prefix = PATH_BASE,
  progress_bar = True
)
model_path, config_path, model_item = manager.download_model(SET_MODEL)
if model_item["default_vocoder"] is None:
  voc_path = None
  voc_config_path = None
else:
  voc_path, voc_config_path, _ = manager.download_model(model_item["default_vocoder"])
 
# (D) SYNTHESIZER
syn = Synthesizer( 
  tts_checkpoint = model_path,
  tts_config_path = config_path,
  vocoder_checkpoint = voc_path,
  vocoder_config = voc_config_path,
  use_cuda = True
)

If you study TTS.api, this is exactly what it does – Create a new Model Manager and Synthesizer. Just why are we manually doing it here? To properly set the download path of models into your project folder (not to some random/user/APPdata/roaming/Idunnowhere/TTS/tts_models/).

 

 

OUTPUT

2-convert.py
# (E) OUTPUT
output = syn.tts(
  text = open(SET_TXT, "r").read(),
  speaker_name = SET_SPEAKER
)
syn.save_wav(output, SET_SAVE)

Lastly, read narrate.txt. Save the generated file to output.wav.

P.S. If you want, you can do “wav to mp3” conversion here. Follow up with your own research.

 

EXTRAS

That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.

 

LINKS & REFERENCES

 

THE END

Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!