Do you happen to have some samples handy of types of text you are typically reading? At least a few pages from a few different sources. Try to find some representative samples spectrum of the content you read.
I may be able set you up with an open source solution using Bark Audio, but it’s impossible to know without poking at the Bark model and seeing if I can find a spot it works in and you start get samples that really sound like it understands. (For example if you use an English Bark voice with a foreign text prompt, even though the Bark TTS model knows the language, the English voice won’t be able to speak it, or will have a horrific accent. Because Bark is kind of sort of modeling ‘person-asked-to-speak-language-they-don’t-know’ in a way. Sort of like how GPT might do that if you changed language mid conversation. Well pre RLHF GPT.)
I don’t want to make any promises, I have terrible focus, I don’t frequent this site often, I give a 50% chance that I forget about this comment entirely until I suddenly remember I posted this in three months from now. Also while the Bark voices are wonderful (they sound like they understand what the are saying) the Bark audio quality (distortion, static) is not. You can stack another model on top to fix but it is annoying.
BUT it just so happens that the most recent source of my lack of focus, to some degree, has been poking at TTS stuff just for fun. Pure amateur hour over here. But the new models are so good they make a lot of stuff easy. And I just happened to see this comment after not visiting this site for weeks.
The https://play.ht/ best voices are maybe comparable though if you just want a quick solution. I do actually prefer Bark, if you can ignore the audio quality, but it’s super unreliable and fiddly.
I think I’d be able to ignore things like static. I’ve listened to some decades-old recordings before with no problem.
If you think you’ll forget to check this site, we could continue on a platform you use more often. My email is kuiranya (at) proton.me, I could give you my discord (for example) from there.
Do you happen to have some samples handy of types of text you are typically reading? At least a few pages from a few different sources. Try to find some representative samples spectrum of the content you read.
I may be able set you up with an open source solution using Bark Audio, but it’s impossible to know without poking at the Bark model and seeing if I can find a spot it works in and you start get samples that really sound like it understands. (For example if you use an English Bark voice with a foreign text prompt, even though the Bark TTS model knows the language, the English voice won’t be able to speak it, or will have a horrific accent. Because Bark is kind of sort of modeling ‘person-asked-to-speak-language-they-don’t-know’ in a way. Sort of like how GPT might do that if you changed language mid conversation. Well pre RLHF GPT.)
I don’t want to make any promises, I have terrible focus, I don’t frequent this site often, I give a 50% chance that I forget about this comment entirely until I suddenly remember I posted this in three months from now. Also while the Bark voices are wonderful (they sound like they understand what the are saying) the Bark audio quality (distortion, static) is not. You can stack another model on top to fix but it is annoying.
BUT it just so happens that the most recent source of my lack of focus, to some degree, has been poking at TTS stuff just for fun. Pure amateur hour over here. But the new models are so good they make a lot of stuff easy. And I just happened to see this comment after not visiting this site for weeks.
The https://play.ht/ best voices are maybe comparable though if you just want a quick solution. I do actually prefer Bark, if you can ignore the audio quality, but it’s super unreliable and fiddly.
Thanks for the offer!
I’m trying to read through a lot of LW and astral codex posts right now. Here are some samples:
https://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
https://astralcodexten.substack.com/p/janus-simulators
https://www.lesswrong.com/posts/uyBeAN5jPEATMqKkX/lies-told-to-children-1
https://carado.moe/values-complex-not-objective.html
(if you meant audio as well, then for example, the sequences, LW curated podcast, and astral codex ten podcast all have lots of audio of associated text)
I think I’d be able to ignore things like static. I’ve listened to some decades-old recordings before with no problem.
If you think you’ll forget to check this site, we could continue on a platform you use more often. My email is kuiranya (at) proton.me, I could give you my discord (for example) from there.
I’m looking into https://play.ht/ as well :)