VALL-E – text-to-speech (TTS) system – Artificial Intelligence Institute

+251-11-558-8786 contact@aii.et

VALL-E – text-to-speech (TTS) system

Post published:July 16, 2024
Post category:news

Microsoft has made remarkable progress in AI speech generation through its VALL-E 2 text-to-speech (TTS) system. VALL-E 2 has achieved human parity, allowing it to generate voices that are indistinguishable from real people. By analyzing just a few seconds of audio, the system can learn and replicate a speaker’s voice.

Extensive tests conducted on speech datasets such as LibriSpeech and VCTK have demonstrated that VALL-E 2’s voice quality matches or even surpasses that of human speech. The system incorporates advanced features like “Repetition Aware Sampling” and “Grouped Code Modeling” to handle complex sentences and repetitive phrases naturally, resulting in smooth and realistic speech output.

Despite sharing audio samples, Microsoft has decided not to release VALL-E 2 to the public at this time, citing concerns about potential misuse, such as voice spoofing. This cautious approach aligns with the broader industry’s recognition of the ethical implications surrounding voice technology, as exemplified by OpenAI’s restrictions on their own voice technology.While VALL-E 2 represents a significant breakthrough, it remains a research project for the time being.

You Might Also Like

AI can Detect 3 of the Deadliest Cancers in Minutes – from just one Drop of Dried Blood

UAE AI Minister Omar Al Olama cautions against AI overregulation

A delegation of AI experts led by the Ambassador of Russia paid a visit to the Ethiopian Artificial Intelligence Institute.

Awesome Live Chat