quanticle comments on Can we evaluate the “tool versus agent” AGI prediction?

quanticle 9 Apr 2023 5:24 UTC
4 points
2
On the flip side, as gwern pointed out in his Clippy short story, it’s possible for a “neutral” GPT-like system to discover agency and deception in its training data and execute upon those prompts without any explicit instruction to do so from its human supervisor. The actions of a tool-AI programmed with a more “obvious” explicit utility function is easier to predict, in some ways, than the actions of something like ChatGPT, where the actions that it’s making visible to you may be a subset (and a deliberately deceptively chosen subset) of all the actions that it is actually taking.