Yudkowsky got almost everything else incorrect about how superhuman AIs would work,
I think this statement is incredibly overconfident, because literally nobody knows how superhuman AI would work.
And, I think, this is general shape of problem: incredible number of people got incredibly overindexed on how LLMs worked in 2022-2023 and drew conclusions which seem to be plausible, but not as probable as these people think.
The really short summary is human values are less complicated and more dependent on data than people thought, and we can specify our values rather easily without it going drastically wrong:
I think this statement is incredibly overconfident, because literally nobody knows how superhuman AI would work.
And, I think, this is general shape of problem: incredible number of people got incredibly overindexed on how LLMs worked in 2022-2023 and drew conclusions which seem to be plausible, but not as probable as these people think.
Okay, I talked more on what conclusions we can draw from LLMs that actually generalize to superhuman AI here, so go check that out:
https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#mPaBbsfpwgdvoK2Z2
The really short summary is human values are less complicated and more dependent on data than people thought, and we can specify our values rather easily without it going drastically wrong:
This is not a property of LLMs, but of us.
is that supposed to be a link?
I rewrote the comment to put the link immediately below the first sentence.
The link is at the very bottom of the comment.