Introduction to MeloTTS

Text to speech AI technology

MeloTTS is a comprehensive, high-quality text-to-speech (TTS) library developed by MyShell.ai. This versatile library supports multiple languages and is optimized for ease of use across various platforms. Whether you’re a developer, content creator, or simply interested in speech technology, MeloTTS offers a robust set of features to cater to your needs.

Supported Languages

MeloTTS is truly multi-lingual, with support for numerous dialects and accents. Here is a list of supported languages along with example links (note: links are placeholders and should be replaced with actual URLs):

  • English (American) 
  • English (British) 
  • English (Indian) 
  • English (Australian) 
  • English (Default) 
  • Spanish 
  • French 
  • Chinese (mix EN) 
  • Japanese 
  • Korean 

Key Features

MeloTTS isn’t just about language support. It includes several standout features:

  • Bilingual Capability: The Chinese model can handle mixed Chinese and English input.
  • Efficiency: Designed for real-time inference even on CPUs.
  • Accessibility: No installation needed for quick tests, thanks to an unofficial live demo hosted on Hugging Face Spaces.

Usage Scenarios

Without Installation

For those eager to test MeloTTS without setup hassles, an online demo is available. This demo provides a user-friendly interface to experience the capabilities of MeloTTS directly from your browser.

MyShell Integration

MyShell hosts a vast library of TTS models, offering more options beyond MeloTTS. Users can explore these models to find the one that best fits their requirements.

Local Installation

For those who prefer local deployment, MeloTTS can be easily installed and used with the following Python code snippet:

 

from melo.api import TTS

# Speed is adjustable
speed = 1.0

# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available

# English 
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id

# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)

# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)

# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)

# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)

# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)

Join the Community

Open Source AI Grant

We are committed to supporting the open-source community through our AI Grant, offering resources like GPU time, funding, and collaboration opportunities with leading research labs. If you’re working on an open-source AI project and need support, please reach out to Zengyi Qin.

Contributing

Contributions to the MeloTTS GitHub repository are highly encouraged. Whether it’s adding new features, fixing bugs, or improving documentation, your input is welcome. Special thanks to @fakerybakery for their contributions to the Web UI and CLI.

License and Acknowledgements

MeloTTS is released under the MIT License, allowing free use for both commercial and non-commercial purposes. We also acknowledge the foundational work of projects like TTSVITSVITS2, and Bert-VITS2, which have significantly influenced the development of MeloTTS.