fiso64 comments on AGI Safety FAQ / all-dumb-questions-allowed thread

fiso64 24 Jun 2022 17:11 UTC
1 point
0
Here’s a non-obvious way it could fail. I don’t expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.
Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and maybe even cause the user to run an obfuscated version of the program from the linked post. The AI doesn’t need to be an agent for any of this to happen (though it would be clearly much more likely if it were one).
I don’t think that any of those failure modes (including the model developing some sort of internal agent to better predict text) are very likely to happen in a controlled environment. However, as others have mentioned, agent AIs are simply more powerful, so we’re going to build them too.