Matthew Barnett comments on Matthew Barnett’s Shortform

Matthew Barnett 26 Jan 2024 22:18 UTC
2 points
After commenting back and forth with you some more, I think it would probably be a pretty good idea to decompose your arguments into a bunch of specific more narrow posts. Otherwise, I think it’s somewhat hard to engage with.
Thanks, that’s reasonable advice.
Idk what the right decomposition is, but minimally, it seems like you could write a post like “The AIs running in a given AI lab will likely have very different long run aims and won’t/can’t cooperate with each other importantly more than they cooperate with humans.”
FWIW I explicitly reject the claim that AIs “won’t/can’t cooperate with each other importantly more than they cooperate with humans”. I view this as a frequent misunderstanding of my views (along with people who have broadly similar views on this topic, such as Robin Hanson). I’d say instead that:
- “Ability to coordinate” is continuous, and will likely increase incrementally over time
- Different AIs will likely have different abilities to coordinate with each other
- Some AIs will eventually be much better at coordination amongst each other than humans can coordinate amongst each other
  - However, I don’t think this happens automatically as a result of AIs getting more intelligent than humans
- The moment during which we hand over control of the world to AIs will likely occur at a point when the ability for AIs to coordinate is somewhere only modestly above human-level (and very far below perfect).
  - As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge
- Systems of laws, peaceable compromise and trade emerge relatively robustly in cases in which there are agents of varying levels of power, with separate values, and they need mechanisms to facilitate the satisfaction of their separate values
  - One reason for this is that working within a system of law is routinely more efficient than going to war with other people, even if you are very powerful
- The existence of a subset of agents that can coordinate better amongst themselves than they can with other agents doesn’t necessarily undermine the legal system in a major way, at least in the sense of causing the system to fall apart in a coup or revolution
What links here?
- Matthew Barnett's comment on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI by Jeremy Gillen (27 Jan 2024 0:03 UTC; 4 points)
- ryan_greenblatt 26 Jan 2024 23:47 UTC
  4 points
  0
  Parent
  Thanks for the clarification and sorry about misunderstanding. It sounds to me like your take is more like “people (on LW? in various threat modeling work?) often overestimate the extent to which AIs (at the critical times) will be a relatively unified collective in various ways”. I think I agree with this take as stated FWIW and maybe just disagree on emphasis and quantity.
- Gerald Monroe 26 Jan 2024 23:27 UTC
  2 points
  Parent
  Why is it physically possible for these AI systems to communicate at all with each other? When we design control systems, originally we just wired the controller to the machine being controlled.
  
  Actually critically important infrastructure uses firewalls and VPN gateways to maintain this property virtually, where the panel in the control room (often written in C++ using Qt) can only ever send messages to “local” destinations on a local network, bridged across the internet.
  
  The actual machine being controlled is often controlled by local PLCs, and the reason such a crude and slow interpreted programming language is used is because its reliable.
  
  These have flaws, yes, but it’s an actionable set of task to seal off the holes, force AI models to communicate with each other using rigid schema, cache the internet reference sources locally, and other similar things so that most AI models in use, especially the strongest ones, can only communicate with temporary instances of other models when doing a task.
  
  After the task is done we should be clearing state.
  
  It’s hard to engage on the idea of “hypothetical” ASI systems when it would be very stupid to build them this way. You can accomplish almost any practical task using the above, and the increased reliability will make it more efficient, not less.
  
  It seems like thats the first mistake. If absolutely no bits of information can be used to negotiate between AI systems (ensured by making sure they don’t have long term memory, so they cannot accumulate stenography leakage over time, and rigid schema) this whole crisis is averted...