Matthew Barnett comments on Matthew Barnett’s Shortform

Matthew Barnett 21 Mar 2024 3:40 UTC
2 points
0
AI models are routinely merged by direct weight manipulation today. Beyond that, two models can be “merged” by training a new model using combined compute, algorithms, data, and fine-tuning.
In my original comment, by “merging” I meant something more like “merging two agents into a single agent that pursues the combination of each other’s values” i.e. value handshakes. I am pretty skeptical that the form of merging discussed in the linked article robustly achieves this agentic form of merging.
In other words, I consider this counter-argument to be based on a linguistic ambiguity rather than replying to what I actually meant, and I’ll try to use more concrete language in the future to clarify what I’m talking about.
How do you know a solution to this problem exists? What if there is no such solution once we hand over control to AIs, i.e., the only solution is to keep humans in charge (e.g. by pausing AI) until we figure out a safer path forward?
I don’t know whether the solution to the problem I described exists, but it seems fairly robustly true that if a problem is not imminent, nor clearly inevitable, then we can probably better solve it by deferring to smarter agents in the future with more information.
Let me put this another way. I take you to be saying something like:
- In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to halt and give ourselves more time to solve it.
Whereas I think the following intuition is stronger:
- In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to try to become more intelligent to solve it.
These intuitions can trade off against each other. Sometimes problem X is something that’s made worse by getting more intelligent, in which case we might prefer more time. For example, in this case, you probably think that the intelligence of AIs are inherently contributing to the problem. That said, in context, I have more sympathies in the reverse direction. If the alleged “problem” is that there might be a centralized agent in the future that can dominate the entire world, I’d intuitively reason that installing vast centralized regulatory controls over the entire world to pause AI is plausibly not actually helping to decentralize power in the way we’d prefer.
These are of course vague and loose arguments, and I can definitely see counter-considerations, but it definitely seems like (from my perspective) that this problem is not really the type where we should expect “try to get more time” to be a robustly useful strategy.
- Wei Dai 22 Mar 2024 4:01 UTC
  2 points
  0
  Parent
  
  In other words, I consider this counter-argument to be based on a linguistic ambiguity rather than replying to what I actually meant, and I’ll try to use more concrete language in the future to clarify what I’m talking about.
  
  If I try to interpret “Current AIs are not able to “merge” with each other.” with your clarified meaning in mind, I think I still want to argue with it, i.e., why is this meaningful evidence for how easy value handshakes will be for future agentic AIs.
  
  In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to try to become more intelligent to solve it.
  
  But it matters how we get more intelligent. For example if I had to choose now, I’d want to increase the intelligence of biological humans (as I previously suggested) while holding off on AI. I want more time in part for people to think through the problem of which method of gaining intelligence is safest, in part for us to execute that method safely without undue time pressure.
  
  If the alleged “problem” is that there might be a centralized agent in the future that can dominate the entire world, I’d intuitively reason that installing vast centralized regulatory controls over the entire world to pause AI is plausibly not actually helping to decentralize power in the way we’d prefer.
  
  I wouldn’t describe “the problem” that way, because in my mind there’s roughly equal chance that the future will turn out badly after proceeding in a decentralized way (see 13-25 in The Main Sources of AI Risk? for some ideas of how) and it turns out instituting some kind of Singleton is the only way or one of the best ways to prevent that bad outcome.