Charbel-Raphaël comments on We might be dropping the ball on Autonomous Replication and Adaptation.

Charbel-Raphaël 1 Jun 2024 0:38 UTC
LW: 4 AF: 3
−2
AF
Thanks for this comment, but I think this might be a bit overconfident.
constantly fighting off the mitigations that humans are using to try to detect them and shut them down.
Yes, I have no doubt that if humans implement some kind of defense, this will slow down ARA a lot. But:
- 1) It’s not even clear people are going to try to react in the first place. As I say, most AI development is positive. If you implement regulations to fight bad ARA, you are also hindering the whole ecosystem. It’s not clear to me that we are going to do something about open source. You need a big warning shot beforehand and this is not really clear to me that this happens before a catastrophic level. It’s clear they’re going to react to some kind of ARAs (like chaosgpt), but there might be some ARAs they won’t react to at all.
- 2) it’s not clear this defense (say for example Know Your Customer for providers) is going to be sufficiently effective to completely clean the whole mess. if the AI is able to hide successfully on laptops + cooperate with some humans, this is going to be really hard to shut it down. We have to live with this endemic virus. The only way around this is cleaning the virus with some sort of pivotal act, but I really don’t like that.
While doing all that, in order to stay relevant, they’ll need to recursively self-improve at the same rate at which leading AI labs are making progress, but with far fewer computational resources.
“at the same rate” not necessarily. If we don’t solve alignment and we implement a pause on AI development in labs, the ARA AI may still continue to develop. The real crux is how much time the ARA AI needs to evolve into something scary.
Superintelligences could do all of this, and ARA of superintelligences would be pretty terrible. But for models in the broad human or slightly-superhuman ballpark, ARA seems overrated, compared with threat models that involve subverting key human institutions.
We don’t learn much here. From my side, I think that superintelligence is not going to be neglected, and big labs are taking this seriously already. I’m still not clear on ARA.
Remember, while the ARA models are trying to survive, there will be millions of other (potentially misaligned) models being deployed deliberately by humans, including on very sensitive tasks (like recursive self-improvement). These seem much more concerning.
This is not the central point. The central point is:
- At some point, ARA is unshutdownable unless you try hard with a pivotal cleaning act. We may be stuck with a ChaosGPT forever, which is not existential, but pretty annoying. People are going to die.
- the ARA evolves over time. Maybe this evolution is very slow, maybe fast. Maybe it plateaus, maybe it does not plateau. I don’t know
- This may take an indefinite number of years, but this can be a problem
the “natural selection favors AIs over humans” argument is a fairly weak one; you can find some comments I’ve made about this by searching my twitter.
I’m pretty surprised by this. I’ve tried to google and not found anything.
Overall, I think this still deserves more research
- Richard_Ngo 3 Jun 2024 19:21 UTC
  LW: 4 AF: 3
  0
  AF Parent
  1) It’s not even clear people are going to try to react in the first place.
  I think this just depends a lot on how large-scale they are. If they are using millions of dollars of compute, and are effectively large-scale criminal organizations, then there are many different avenues by which they might get detected and suppressed.
  If we don’t solve alignment and we implement a pause on AI development in labs, the ARA AI may still continue to develop.
  A world which can pause AI development is one which can also easily throttle ARA AIs.
  The central point is:
  At some point, ARA is unshutdownable unless you try hard with a pivotal cleaning act. We may be stuck with a ChaosGPT forever, which is not existential, but pretty annoying. People are going to die.
  the ARA evolves over time. Maybe this evolution is very slow, maybe fast. Maybe it plateaus, maybe it does not plateau. I don’t know
  This may take an indefinite number of years, but this can be a problem
  This seems like a weak central point. “Pretty annoying” and some people dying is just incredibly small compared with the benefits of AI. And “it might be a problem in an indefinite number of years” doesn’t justify the strength of the claims you’re making in this post, like “we are approaching a point of no return” and “without a treaty, we are screwed”.
  An extended analogy: suppose the US and China both think it might be possible to invent a new weapon far more destructive than nuclear weapons, and they’re both worried that the other side will invent it first. Worrying about ARAs feels like worrying about North Korea’s weapons program. It could be a problem in some possible worlds, but it is always going to be much smaller, it will increasingly be left behind as the others progress, and if there’s enough political will to solve the main problem (US and China racing) then you can also easily solve the side problem (e.g. by China putting pressure on North Korea to stop).
  you can find some comments I’ve made about this by searching my twitter
  Link here, and there are other comments in the same thread. Was on my laptop, which has twitter blocked, so couldn’t link it myself before.
  - ryan_greenblatt 3 Jun 2024 19:45 UTC
    LW: 4 AF: 3
    0
    AF Parent
    
    A world which can pause AI development is one which can also easily throttle ARA AIs.
    
    I push back on this somewhat in a discussion thread here. (As a pointer to people reading through.)
    
    Overall, I think this is likely to be true (maybe 60% likely), but not enough that we should feel totally fine about the situation.
  - Charbel-Raphaël 5 Jun 2024 8:13 UTC
    LW: 2 AF: 1
    0
    AF Parent
    doesn’t justify the strength of the claims you’re making in this post, like “we are approaching a point of no return” and “without a treaty, we are screwed”.
    I agree that’s a bit too much, but it seems to me that we’re not at all on the way to stopping open source development, and that we need to stop it at some point; maybe you think ARA is a bit early, but I think we need a red line before AI becomes human-level, and ARA is one of the last arbitrary red lines before everything accelerates.
    But I still think no return to loss of control because it might be very hard to stop ARA agent still seems pretty fair to me.
    Link here, and there are other comments in the same thread. Was on my laptop, which has twitter blocked, so couldn’t link it myself before.
    I agree with your comment on twitter that evolutionary forces are very slow compared to deliberate design, but that is not way I wanted to convey (that’s my fault). I think an ARA agent would not only depend on evolutionary forces, but also on the whole open source community finding new ways to quantify, prune, distill, and run the model in a distributed way in a practical way. I think the main driver this “evolution” would be the open source community & libraries who will want to create good “ARA”, and huge economic incentive will make agent AIs more and more common and easy in the future.