OpenVoice currently ranks second-to-last in the Huggingface TTS arena leaderboar...

KennyBlanken · on March 29, 2024

Having gone through almost ten rounds of the TTS Arena, XTT2 has tons of artifacts that instantly make it sound non-human. OpenVoice doesn't.

It wouldn't surprise me if people recognize different algorithms and purposefully promote them over others, or alter the page source with a userscript to see the algorithm before listening and click the one they're trying to promote. Looking at the leaderboard, it's obvious there's manipulation going on, because Metavoice is highly ranked but generates absolutely terrible speech with extremely unnatural pauses.

Elevenlabs was scarily natural sounding and high quality; the best of the ones I listened to so far. Pheme's speech overall sounds really natural, but has terrible sound quality, which is probably why it's ranked so well. If Pheme could be higher quality audio, it'd probably match Elevenlabs.

carbocation · on March 29, 2024

I would like to see the new VoiceCraft model on that list eventually (weights released yesterday, discussion at [1]).

1 = https://news.ycombinator.com/item?id=39865340

m463 · on April 1, 2024

I haven't tried openvoice, but I did try whisperspeech and it will do the same thing. You can optionally pass in a file with a reference voice, and the tts uses it.

https://github.com/collabora/whisperspeech

I found it to be kind of creepy hearing it in my own voice. I also tried a friend of mine who had a french canadian accent and strangely the output didn't have his accent.

ckl1810 · on March 29, 2024

Is there a benchmark for compute needed? Curious to see if anyone is building / has built a Zoom filter, or Mobile app, whereby I can speak English, and out comes Chinese to the listener.

abdullahkhalids · on March 30, 2024

HG TTS arena is asking if the text-to-speech sounds human like. That's somewhat different from voice cloning. A model might produce audio which is less human like, but still sound closer to the target voice.

Jackson__ · on March 29, 2024

As someone who has used the arena maybe ~3 times, the subpar voice quality in the demo linked immediately stood out to me.

c0brac0bra · on March 29, 2024

I'd like to see Deepgram Aura on here.