Microsoft Research

VALL-E X

Name: VALL-E X
Brand: Microsoft Research
Rating: 8.9 (1 reviews)

Zero-shot neural codec language model for TTS

Voice cloning from short samplesEmotional expressivenessHigh fidelity synthesisCodec-based approach

Today's score

89.0

Try VALL-E X

Where it ranks today

Audio

Best for / Not great for

Best for

Quick voice prototyping
Personalized audio messages
Adding emotional depth to TTS
Research into neural audio codecs

Not great for

Generating long, continuous speech without artifacts
Real-time, low-latency applications without optimization
Direct music generation
Easy integration for non-researchers

Why it ranks here

VALL-E X represents significant progress in zero-shot TTS, enabling impressive voice cloning and nuanced expression, positioning it as a key research model for future realistic speech synthesis.

30-day trend

Score breakdown

Search trends88

Benchmarks90

Developer buzz91

News mentions90

Pricing

API: $0.00 in · $0.00 out per 1M tokens · Consumer: $0.00/mo

Pricing plans

Popular

Research Release

Explore the cutting edge of TTS

Free

Open-source code
Pre-trained models
Requires technical expertise
Focus on research

View code

Compare with another model How is this score calculated? →Snapshot 2026-05-25