Introduction to MeloTTS
MeloTTS is a comprehensive, high-quality text-to-speech (TTS) library developed by MyShell.ai. This versatile library supports multiple languages and is optimized for ease of use across various platforms. Whether you’re a developer, content creator, or simply interested in speech technology, MeloTTS offers a robust set of features to cater to your needs.
Supported Languages
MeloTTS is truly multi-lingual, with support for numerous dialects and accents. Here is a list of supported languages along with example links (note: links are placeholders and should be replaced with actual URLs):
- English (American)
- English (British)
- English (Indian)
- English (Australian)
- English (Default)
- Spanish
- French
- Chinese (mix EN)
- Japanese
- Korean
Key Features
MeloTTS isn’t just about language support. It includes several standout features:
- Bilingual Capability: The Chinese model can handle mixed Chinese and English input.
- Efficiency: Designed for real-time inference even on CPUs.
- Accessibility: No installation needed for quick tests, thanks to an unofficial live demo hosted on Hugging Face Spaces.
Usage Scenarios
Without Installation
For those eager to test MeloTTS without setup hassles, an online demo is available. This demo provides a user-friendly interface to experience the capabilities of MeloTTS directly from your browser.
MyShell Integration
MyShell hosts a vast library of TTS models, offering more options beyond MeloTTS. Users can explore these models to find the one that best fits their requirements.
Local Installation
For those who prefer local deployment, MeloTTS can be easily installed and used with the following Python code snippet:
from melo.api import TTS
# Speed is adjustable
speed = 1.0
# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available
# English
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id
# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)
# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)
# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)
# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)
# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)
Join the Community
Open Source AI Grant
We are committed to supporting the open-source community through our AI Grant, offering resources like GPU time, funding, and collaboration opportunities with leading research labs. If you’re working on an open-source AI project and need support, please reach out to Zengyi Qin.
Contributing
Contributions to the MeloTTS GitHub repository are highly encouraged. Whether it’s adding new features, fixing bugs, or improving documentation, your input is welcome. Special thanks to @fakerybakery for their contributions to the Web UI and CLI.
License and Acknowledgements
MeloTTS is released under the MIT License, allowing free use for both commercial and non-commercial purposes. We also acknowledge the foundational work of projects like TTS, VITS, VITS2, and Bert-VITS2, which have significantly influenced the development of MeloTTS.