Listening to your sets of samples, the collapse in quality does seem drastic.
This is a little puzzling because there’s nothing about RLHF mode collapse per se which ought to trigger repetition within the song. Your complaint ought to be that the tracks all now sound similar to each other and blandly mealy-mouthed, not that each track sounds similar to itself… That should actually be strongly discouraged by the user ratings, as repetition is a simple pattern which ought to be easy for listeners to downvote and for reward models to detect & punish.
This makes me wonder if they screwed up in some way; for example, a way in which RLHF could produce this pathological degradation to repeating loops would be if they took a shortcut & applied RLHF only on a small window like 10s (rather than the entire music track). This would lead the model to greedily optimize only for a short snippet, and then mode collapse onto just repeating that snippet indefinitely (but uniquely so per track/prompt, and not necessarily collapsing all similarly-prompted tracks onto the same track/snippet). User ratings penalize repetition would then have no effect, because once the rating filters down to the chopped-up-10s-snippet dataset, the repetition has disappeared, leaving the model baffled as to why one was preferred when they are (at the snippet level) equally good. (Which could then screw up the model further by adding tons of noise to the already-impoverished feedback signal.)
I made a little correction to some of my prompts on SoundCloud etc. I of course don’t make prompts like “The Cat Was...” like SoundCloud converts them to show, but sometimes the first letter is indeed capitalized, so still check comments section for exact prompt. As of August 26 I have fully (or almost fully if not!) added now the exact prompts showing where capital letters are/ aren’t. I of course have them on file though correctly and tested with same prompts.
Any updates on this? For example, I notice that the new music services like Suno & Udio seem to be betraying a bit of mode collapse and noticeable same-yness, but they certainly do not degenerate into such within-song repetition like these were.
Listening to your sets of samples, the collapse in quality does seem drastic.
This is a little puzzling because there’s nothing about RLHF mode collapse per se which ought to trigger repetition within the song. Your complaint ought to be that the tracks all now sound similar to each other and blandly mealy-mouthed, not that each track sounds similar to itself… That should actually be strongly discouraged by the user ratings, as repetition is a simple pattern which ought to be easy for listeners to downvote and for reward models to detect & punish.
This makes me wonder if they screwed up in some way; for example, a way in which RLHF could produce this pathological degradation to repeating loops would be if they took a shortcut & applied RLHF only on a small window like 10s (rather than the entire music track). This would lead the model to greedily optimize only for a short snippet, and then mode collapse onto just repeating that snippet indefinitely (but uniquely so per track/prompt, and not necessarily collapsing all similarly-prompted tracks onto the same track/snippet). User ratings penalize repetition would then have no effect, because once the rating filters down to the chopped-up-10s-snippet dataset, the repetition has disappeared, leaving the model baffled as to why one was preferred when they are (at the snippet level) equally good. (Which could then screw up the model further by adding tons of noise to the already-impoverished feedback signal.)
I made a little correction to some of my prompts on SoundCloud etc. I of course don’t make prompts like “The Cat Was...” like SoundCloud converts them to show, but sometimes the first letter is indeed capitalized, so still check comments section for exact prompt. As of August 26 I have fully (or almost fully if not!) added now the exact prompts showing where capital letters are/ aren’t. I of course have them on file though correctly and tested with same prompts.
Any updates on this? For example, I notice that the new music services like Suno & Udio seem to be betraying a bit of mode collapse and noticeable same-yness, but they certainly do not degenerate into such within-song repetition like these were.