Seth Herd comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Seth Herd 17 Dec 2023 20:42 UTC
2 points
0
I agree that things like AutoGPT are an ideal architecture for something exactly like retarget the search. I’ve noted that same similarity in Steering subsystems: capabilities, agency, and alignment and a stronger similarity in an upcoming post. In Internal independent review for language model agent alignment I note the alignment advantages you list, and a couple of others.

Current AutoGPT is simply too incompetent to effectively pursue a goal. Other similar systems are more competent (the two Minecraft LLM agent systems are the most impressive), but nobody has let them run ad infinitum to test their Goodharting. I’d assume they’d show it. Goodhart will apply increasingly as those systems actually pursue goals.

AutoGPT isn’t a company, it’s a little open-source project. Any companies working on agents aren’t publicizing their work so far.

I do suspect that actively improving things like AutoGPT is a good route to addressing x-risk because of their advantages for alignment. But I’m not sure enough to start advocating it.
- Ebenezer Dukakis 17 Dec 2023 21:26 UTC
  1 point
  0
  Parent
  
  AutoGPT isn’t a company, it’s a little open-source project. Any companies working on agents aren’t publicizing their work so far.
  
  They raise $12M: https://twitter.com/Auto_GPT/status/1713009267194974333
  
  You could be right that they haven’t incorporated as a company. I wasn’t able to find information about that.
  - Seth Herd 17 Dec 2023 21:52 UTC
    2 points
    0
    Parent
    Wow, interesting. The say it will be the largest open-source project in history. I have no idea how an open-source project raises $12m but they did.