gwern comments on [AN #157]: Measuring misalignment in the technology underlying Copilot

gwern 23 Jul 2021 18:44 UTC
5 points

Rohin’s opinion: I really liked the experiment demonstrating misalignment, as it seems like it accurately captures the aspects that we expect to see with existentially risky misaligned AI systems: they will “know” how to do the thing we want, they simply won’t be “motivated” to actually do it.

Nic jokes:

In the end humanity was saved by adding “Super safe.” to all their requests of the AGI

My counter joke (in EAI) was:

“AGI but its supr safe.”

(GPT-3 is an agent-predicting agent.)
What links here?
- adamShimi's comment on [AN #157]: Measuring misalignment in the technology underlying Copilot by Rohin Shah (24 Jul 2021 9:06 UTC; 7 points)
- Rohin Shah's comment on [AN #157]: Measuring misalignment in the technology underlying Copilot by Rohin Shah (25 Jul 2021 12:50 UTC; 2 points)