(While I appreciate many of the investigations in this paper and think it is good to improve our understanding, I don’t think they let us tell what’s up with risk.) This could be the subject of a much longer post and maybe will be discussed in the comments.
Do you mean they don’t tell us what’s up with the difference in risks of the measured techniques, or that they don’t tell us much about AI risk in general? (I’d at least benefit from learning more about your views here)
Do you mean they don’t tell us what’s up with the difference in risks of the measured techniques, or that they don’t tell us much about AI risk in general? (I’d at least benefit from learning more about your views here)
Yes, I mean that those measurements don’t really speak directly to the question of whether you’d be safer using RLHF or imitation learning.