I don’t understand why Chollet thinks the smart child and the mediocre child are doing categorically different things. Why can’t the mediocre child be GPT-4, and the smart child GPT-6? I find the analogies Chollet and others draw in an effort to explain away the success of deep learning sufficient to explain what the human brain does, and it’s not clear a different category of mind will or can ever exist (I don’t make this claim, I’m just saying that Chollet’s distinction is not evidenced).
Chollet points to real shortcomings of modern deep learning systems, but these are often exacerbated by factors not directly relevant to problem solving ability such as tokenization, so often I take them more lightly than I estimate he does.
I don’t think the point of RLHF ever was value alignment, and I doubt this is what Paul Christiano and others intended RLHF to solve. RLHF might be useful in worlds without capabilities and deception discontinuities (plausibly ours), because we are less worried about sudden ARA, and more interested in getting useful behavior from models before we go out with a whimper.
This theory of change isn’t perfect. There is an argument that RLHF was net-negative, and this argument has been had.
My point is that you are assessing RLHF using your model of AI risk, so the disagreement here might actually be unrelated to RLHF and dissolve if you and the RLHF progenitors shared a common position on AI risk.