And I will note that the claim “AI [or LLMs specifically] won’t be deceptive or evil, and only would be if someone made them so” is one that is extremely widely held and always has been, even in relatively sophisticated tech circles. Just look at any HN discussion of any of the LLM deception papers.
Perhaps I should have said “it’s deeply unsurprising if you actually stop and think about how base models are trained”? :-)
We’re training LLMs on a vast range of human output (and fiction). Not all humans (or fictional characters) are fine upstanding citizens. The argument you link to, one side is basically pointing out “LLMs are roughly as good at basic moral decisions as most humans”. Personally I wouldn’t trust almost all humans with absolute power of the sort an ASI would have: power corrupts, especially absolute power. The success criterion for aligning an ASI isn’t “as moral as a typical human” (look at the track record of most autocrats), it’s more like “as moral as an angel”.
It’s amusing that you’re saying it’s “deeply unsurprising” at the same time as https://www.lesswrong.com/posts/MnrQMLuEg5wZ7f4bn/matthew-barnett-s-shortform?commentId=n4j2qmhnj9zBKigvX is hotly going on, and not a few people in AI have been making claims about alignment being largely solved and having been a pseudo-problem at best.
And I will note that the claim “AI [or LLMs specifically] won’t be deceptive or evil, and only would be if someone made them so” is one that is extremely widely held and always has been, even in relatively sophisticated tech circles. Just look at any HN discussion of any of the LLM deception papers.
Perhaps I should have said “it’s deeply unsurprising if you actually stop and think about how base models are trained”? :-)
We’re training LLMs on a vast range of human output (and fiction). Not all humans (or fictional characters) are fine upstanding citizens. The argument you link to, one side is basically pointing out “LLMs are roughly as good at basic moral decisions as most humans”. Personally I wouldn’t trust almost all humans with absolute power of the sort an ASI would have: power corrupts, especially absolute power. The success criterion for aligning an ASI isn’t “as moral as a typical human” (look at the track record of most autocrats), it’s more like “as moral as an angel”.
For a more detailed argument on this, see my posts Evolution and Ethics, Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI? and for some background also Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor.