This suggests that RLHF models may begin by acquiring disparate attractors which eventually merge into a global attractor as the policy is increasingly optimized against the reward model.
aaaaaaa
aaaaaaa