So you have heard of “paid text to speech AI services”, and wonder if there are free alternatives? The answer is yes. Several of those paid services are actually based on open-source models, just fine-tuned. Let us walk through installing Coqui TTS, one such “free AI text to speech” library in this guide – Read on!
TABLE OF CONTENTS
DOWNLOAD & NOTES
Here is the download link to the example code, so you don’t have to copy-paste everything.
EXAMPLE CODE DOWNLOAD
Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.
SORRY FOR THE ADS...
But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.
Buy Me A Coffee Code Boxx eBooks
WORKING WITH COQUI TTS
Brace yourselves, getting Coqui TTS to work involves fighting with digital dragons. There is no “one-click installer”, and it took a senior web developer an entire day to figure some things out. Here is the stuff I did, hope it saves you from searching all around the Internet.
PART 1) INSTALLATION
Behold, the dreaded “trial by installation”. At the time of writing, I am using Windows 11. Some things may be different for Linux and Mac users, but here’s what you need:
- Install Python if you have not already done so.
- As on the Coqui GitHub Page, Python 3.8 to 3.10 should work fine.
- If you have an incompatible version of Python, do your own research on “use different Python version with virtualenv”.
- Install the Microsoft C++ Build Tools.
- Pick “Visual Studio Community, Desktop Environment For C++”.
- Not sure if .NET tools are required, but also install those if you want to be safe.
- Download and install
espeak
, some text-to-speech models use this.- espeak-ng – Scroll down and expand “assets”.
- Linux users –
apt-get install espeak
PART 2) PROJECT SETUP
- Create your own project folder. E.G.
D:\COQUI
- Open the command line, navigate to the project folder –
cd D:\COQUI
- Create a virtual environment to not mess up your other projects.
virtualenv venv
- For Windows –
venv\Scripts\activate
- For Linux/Mac –
venv/bin/activate
- Download Coqui –
pip install tts
The end. We are good to go.
PART 3) RUNNING COQUI TTS
There are 3 ways to run Coqui TTS.
3A) IN THE COMMAND LINE
This is the easiest way, just run tts --text "YOUR TEXT" --out_path PATH/SPEECH.WAV
in the command line to perform the text-to-speech conversion. There are several other parameters as well, run tts -h
to show all of them.
3B) IN THE BROWSER
Run tts-server
in the command line. It should automatically install and set some things up, eventually show something like Running on http://[::1]:5002
. Open your browser and access http://localhost:5002
for the very simple graphical interface… Not very useful, but it works.
3C) SIMPLE PYTHON SCRIPT
from TTS.api import TTS
tts = TTS(model_name="tts_models/en/ljspeech/vits", progress_bar=True, gpu=True)
tts.tts_to_file("YES! Text to speech works like magic.", file_path="OUTPUT.wav")
This is pretty much a “rip-off” from the example code on the Coqui GitHub page. Just import the TTS
module, instantiate a new object, and call tts_to_file()
.
PART 4) INSTALLATION PAINS
4A) NUMPY & NUMBA
Congratulations if the above worked for you without any issues. But if you get Failed to initialize NumPy: module compiled against API version
, it’s a “version mismatch” issue. A pip install --upgrade numpy numba
to the latest versions did the magic.
4B) GRAPHIC DRIVER PAINS
Take note that the above example is set to gpu=True
. Coqui will run without using the GPU, but it’s a lot more efficient with one. Not so sure about the support for AMD graphic cards, but if you run into trouble with Nvidia cards:
- Get the latest Nvidia driver for your video card.
- Rebuild your PyTorch.
- At the time of writing, this will install PyTorch with CUDA support for Windows –
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --force-reinstall
. - For other OS, please follow up with the PyTorch link above.
- At the time of writing, this will install PyTorch with CUDA support for Windows –
MORE COQUI TTS
Ignore this section if you are happy with the above. This is a small “custom Coqui TTS” script that I wrote, to perform text-to-speech on a text file.
TEXT FILE TO NARRATION
INIT COQUI TTS
# (A) LOAD MODULES
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
import os
# (B) SETTINGS
PATH_BASE = os.path.dirname(__file__)
SET_TXT = os.path.join(PATH_BASE, "narrate.txt")
SET_SAVE = os.path.join(PATH_BASE, "output.wav")
SET_MODEL = "tts_models/en/vctk/vits"
SET_SPEAKER = "p274"
Credits to this example I found at Ulife. The first part of the script should be self-explanatory… Load the TTS library, and define the settings.
MODEL MANAGER & SYNTHESIZER
# (C) MODEL MANAGER
manager = ModelManager(
models_file = PATH_BASE + "/venv/Lib/site-packages/TTS/.models.json",
output_prefix = PATH_BASE,
progress_bar = True
)
model_path, config_path, model_item = manager.download_model(SET_MODEL)
if model_item["default_vocoder"] is None:
voc_path = None
voc_config_path = None
else:
voc_path, voc_config_path, _ = manager.download_model(model_item["default_vocoder"])
# (D) SYNTHESIZER
syn = Synthesizer(
tts_checkpoint = model_path,
tts_config_path = config_path,
vocoder_checkpoint = voc_path,
vocoder_config = voc_config_path,
use_cuda = True
)
If you study TTS.api
, this is exactly what it does – Create a new Model Manager and Synthesizer. Just why are we manually doing it here? To properly set the download path of models into your project folder (not to some random/user/APPdata/roaming/Idunnowhere/TTS/tts_models/
).
OUTPUT
# (E) OUTPUT
output = syn.tts(
text = open(SET_TXT, "r").read(),
speaker_name = SET_SPEAKER
)
syn.save_wav(output, SET_SAVE)
Lastly, read narrate.txt
. Save the generated file to output.wav
.
P.S. If you want, you can do “wav to mp3” conversion here. Follow up with your own research.
EXTRAS
That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.
LINKS & REFERENCES
- Coqui TTS – GitHub
- Coqui TTS Documentation
THE END
Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!