the gears to ascension comments on Mysteries of mode collapse

the gears to ascension 11 Nov 2022 6:20 UTC
0 points
0
This suggests that RLHF models may begin by acquiring disparate attractors which eventually merge into a global attractor as the policy is increasingly optimized against the reward model.
aaaaaaa