After commenting back and forth with you some more, I think it would probably be a pretty good idea to decompose your arguments into a bunch of specific more narrow posts. Otherwise, I think it’s somewhat hard to engage with. Ideally, these would done with the decomposition which is most natural to your target audience, but that might be too hard.
Idk what the right decomposition is, but minimally, it seems like you could write a post like “The AIs running in a given AI lab will likely have very different long run aims and won’t/can’t cooperate with each other importantly more than they cooperate with humans.” I think this might be the main disagreement between us. (The main counterarguments to engage with are “probably all the AIs will be forks off of one main training run, it’s plausible this results in unified values” and also “the AI creation process between two AI instances will look way more similar than the creation process between AIs and humans” and also “there’s a chance that AIs will have an easier time cooperating with and making deals with each other than they will making deals with humans”.)
After commenting back and forth with you some more, I think it would probably be a pretty good idea to decompose your arguments into a bunch of specific more narrow posts. Otherwise, I think it’s somewhat hard to engage with.
Thanks, that’s reasonable advice.
Idk what the right decomposition is, but minimally, it seems like you could write a post like “The AIs running in a given AI lab will likely have very different long run aims and won’t/can’t cooperate with each other importantly more than they cooperate with humans.”
FWIW I explicitly reject the claim that AIs “won’t/can’t cooperate with each other importantly more than they cooperate with humans”. I view this as a frequent misunderstanding of my views (along with people who have broadly similar views on this topic, such as Robin Hanson). I’d say instead that:
“Ability to coordinate” is continuous, and will likely increase incrementally over time
Different AIs will likely have different abilities to coordinate with each other
Some AIs will eventually be much better at coordination amongst each other than humans can coordinate amongst each other
However, I don’t think this happens automatically as a result of AIs getting more intelligent than humans
The moment during which we hand over control of the world to AIs will likely occur at a point when the ability for AIs to coordinate is somewhere only modestly above human-level (and very far below perfect).
As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge
Systems of laws, peaceable compromise and trade emerge relatively robustly in cases in which there are agents of varying levels of power, with separate values, and they need mechanisms to facilitate the satisfaction of their separate values
One reason for this is that working within a system of law is routinely more efficient than going to war with other people, even if you are very powerful
The existence of a subset of agents that can coordinate better amongst themselves than they can with other agents doesn’t necessarily undermine the legal system in a major way, at least in the sense of causing the system to fall apart in a coup or revolution
Thanks for the clarification and sorry about misunderstanding. It sounds to me like your take is more like “people (on LW? in various threat modeling work?) often overestimate the extent to which AIs (at the critical times) will be a relatively unified collective in various ways”. I think I agree with this take as stated FWIW and maybe just disagree on emphasis and quantity.
Why is it physically possible for these AI systems to communicate at all with each other? When we design control systems, originally we just wired the controller to the machine being controlled.
Actually critically important infrastructure uses firewalls and VPN gateways to maintain this property virtually, where the panel in the control room (often written in C++ using Qt) can only ever send messages to “local” destinations on a local network, bridged across the internet.
The actual machine being controlled is often controlled by local PLCs, and the reason such a crude and slow interpreted programming language is used is because its reliable.
These have flaws, yes, but it’s an actionable set of task to seal off the holes, force AI models to communicate with each other using rigid schema, cache the internet reference sources locally, and other similar things so that most AI models in use, especially the strongest ones, can only communicate with temporary instances of other models when doing a task.
After the task is done we should be clearing state.
It’s hard to engage on the idea of “hypothetical” ASI systems when it would be very stupid to build them this way. You can accomplish almost any practical task using the above, and the increased reliability will make it more efficient, not less.
It seems like thats the first mistake. If absolutely no bits of information can be used to negotiate between AI systems (ensured by making sure they don’t have long term memory, so they cannot accumulate stenography leakage over time, and rigid schema) this whole crisis is averted...
After commenting back and forth with you some more, I think it would probably be a pretty good idea to decompose your arguments into a bunch of specific more narrow posts. Otherwise, I think it’s somewhat hard to engage with. Ideally, these would done with the decomposition which is most natural to your target audience, but that might be too hard.
Idk what the right decomposition is, but minimally, it seems like you could write a post like “The AIs running in a given AI lab will likely have very different long run aims and won’t/can’t cooperate with each other importantly more than they cooperate with humans.” I think this might be the main disagreement between us. (The main counterarguments to engage with are “probably all the AIs will be forks off of one main training run, it’s plausible this results in unified values” and also “the AI creation process between two AI instances will look way more similar than the creation process between AIs and humans” and also “there’s a chance that AIs will have an easier time cooperating with and making deals with each other than they will making deals with humans”.)
Thanks, that’s reasonable advice.
FWIW I explicitly reject the claim that AIs “won’t/can’t cooperate with each other importantly more than they cooperate with humans”. I view this as a frequent misunderstanding of my views (along with people who have broadly similar views on this topic, such as Robin Hanson). I’d say instead that:
“Ability to coordinate” is continuous, and will likely increase incrementally over time
Different AIs will likely have different abilities to coordinate with each other
Some AIs will eventually be much better at coordination amongst each other than humans can coordinate amongst each other
However, I don’t think this happens automatically as a result of AIs getting more intelligent than humans
The moment during which we hand over control of the world to AIs will likely occur at a point when the ability for AIs to coordinate is somewhere only modestly above human-level (and very far below perfect).
As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge
Systems of laws, peaceable compromise and trade emerge relatively robustly in cases in which there are agents of varying levels of power, with separate values, and they need mechanisms to facilitate the satisfaction of their separate values
One reason for this is that working within a system of law is routinely more efficient than going to war with other people, even if you are very powerful
The existence of a subset of agents that can coordinate better amongst themselves than they can with other agents doesn’t necessarily undermine the legal system in a major way, at least in the sense of causing the system to fall apart in a coup or revolution
Thanks for the clarification and sorry about misunderstanding. It sounds to me like your take is more like “people (on LW? in various threat modeling work?) often overestimate the extent to which AIs (at the critical times) will be a relatively unified collective in various ways”. I think I agree with this take as stated FWIW and maybe just disagree on emphasis and quantity.
Why is it physically possible for these AI systems to communicate at all with each other? When we design control systems, originally we just wired the controller to the machine being controlled.
Actually critically important infrastructure uses firewalls and VPN gateways to maintain this property virtually, where the panel in the control room (often written in C++ using Qt) can only ever send messages to “local” destinations on a local network, bridged across the internet.
The actual machine being controlled is often controlled by local PLCs, and the reason such a crude and slow interpreted programming language is used is because its reliable.
These have flaws, yes, but it’s an actionable set of task to seal off the holes, force AI models to communicate with each other using rigid schema, cache the internet reference sources locally, and other similar things so that most AI models in use, especially the strongest ones, can only communicate with temporary instances of other models when doing a task.
After the task is done we should be clearing state.
It’s hard to engage on the idea of “hypothetical” ASI systems when it would be very stupid to build them this way. You can accomplish almost any practical task using the above, and the increased reliability will make it more efficient, not less.
It seems like thats the first mistake. If absolutely no bits of information can be used to negotiate between AI systems (ensured by making sure they don’t have long term memory, so they cannot accumulate stenography leakage over time, and rigid schema) this whole crisis is averted...