RogerDearnaley comments on Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

RogerDearnaley 18 Jun 2024 2:14 UTC
4 points
0
Perhaps I should have said “it’s deeply unsurprising if you actually stop and think about how base models are trained”? :-)

We’re training LLMs on a vast range of human output (and fiction). Not all humans (or fictional characters) are fine upstanding citizens. The argument you link to, one side is basically pointing out “LLMs are roughly as good at basic moral decisions as most humans”. Personally I wouldn’t trust almost all humans with absolute power of the sort an ASI would have: power corrupts, especially absolute power. The success criterion for aligning an ASI isn’t “as moral as a typical human” (look at the track record of most autocrats), it’s more like “as moral as an angel”.
For a more detailed argument on this, see my posts Evolution and Ethics, Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI? and for some background also Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor.