Current AutoGPT is simply too incompetent to effectively pursue a goal. Other similar systems are more competent (the two Minecraft LLM agent systems are the most impressive), but nobody has let them run ad infinitum to test their Goodharting. I’d assume they’d show it. Goodhart will apply increasingly as those systems actually pursue goals.
AutoGPT isn’t a company, it’s a little open-source project. Any companies working on agents aren’t publicizing their work so far.
I do suspect that actively improving things like AutoGPT is a good route to addressing x-risk because of their advantages for alignment. But I’m not sure enough to start advocating it.
I agree that things like AutoGPT are an ideal architecture for something exactly like retarget the search. I’ve noted that same similarity in Steering subsystems: capabilities, agency, and alignment and a stronger similarity in an upcoming post. In Internal independent review for language model agent alignment I note the alignment advantages you list, and a couple of others.
Current AutoGPT is simply too incompetent to effectively pursue a goal. Other similar systems are more competent (the two Minecraft LLM agent systems are the most impressive), but nobody has let them run ad infinitum to test their Goodharting. I’d assume they’d show it. Goodhart will apply increasingly as those systems actually pursue goals.
AutoGPT isn’t a company, it’s a little open-source project. Any companies working on agents aren’t publicizing their work so far.
I do suspect that actively improving things like AutoGPT is a good route to addressing x-risk because of their advantages for alignment. But I’m not sure enough to start advocating it.
They raise $12M: https://twitter.com/Auto_GPT/status/1713009267194974333
You could be right that they haven’t incorporated as a company. I wasn’t able to find information about that.
Wow, interesting. The say it will be the largest open-source project in history. I have no idea how an open-source project raises $12m but they did.