Seth Herd comments on On AutoGPT

Seth Herd 22 Nov 2023 20:35 UTC
4 points
0
I agree that this is way too dangerous to just give a command and have the agent go off and do something big based on its interpretation. Any failures are too many.

I’ve written about this in Internal independent review for language model agent alignment

The argument there is that we’ll want lots of redundant checks for capabilities and mundane safety as well as existential risks.

I think this will apply to mundane requests like “start a business selling cute outfits on Ebay” as well. You don’t want the agent to take actions on your behalf that don’t do what you meant. You don’t want it to spend all the money you gave it in stupid ways, irritate people on your behalf, etc. So adding checks before executing is helpful for mundane safety. You’ll probably have human involvement in any complex plans; you don’t even want to spend a bunch of money on LLM calls exploring fundamentally misdirected plans.