I don’t quite fully grasp why world-model divergence is inherently so problematic unless there is some theorem that says robust coordination is only possible with full synchronization. Is there something preventing the possibility of alignment among agents with significantly divergent world models?
I don’t actually expect ontology divergence to be that much of an issue, but at this point ontology divergence is a very poorly understood problem in general, and I think it’s at least plausible that it could be a fundamental barrier to coordination. The story is conditioning on the world where it does turn out to be a major barrier.
It would potentially be problematic for the sorts of reasons sketched out in The Pointers Problem. Roughly speaking, if the pointers problem turns out to be fundamentally intractable, then that means that the things humans want (and probably the things minds want more generally) only make sense at all in a very specific world-model, and won’t really correspond to anything in the ontologies of other minds. That makes it hard to delegate, since other minds have an inherently limited understanding of what we’re even asking for, and we need to exchange a very large amount of information to clarify enough to get good results.
In practice, this would probably look like needing more and more shared background knowledge in order to delegate a task, as the task complexity increases. In order to be a major barrier even for an AI, the scaling would have to be very bad (i.e. amount of shared background increases very rapidly with task complexity), and breaking down complex tasks into simpler tasks would have to be prohibitively expensive (which does seem realistic for complex tasks in practice).
I don’t think this scenario is actually true (see the Natural Abstraction Hypothesis for the opposite), but I do think it’s at least plausible.
Got it. It’s more of an assumption than known to be difficult. Personally, I suspect that it’s not a fundamental barrier given how good humans are at good at chunking concepts into layers of abstraction that can be communicated much more easily than carefully comparing entire models of the world.
Yeah this is one where it seems like as long as the delegator and task engine are both rational (aka manager and worker) it works fine.
The problems show up in 2 ways : when what the organization is itself incentived by is misaligned with the needs of the host society, or when the incomplete bookkeeping at a layer or corruption or indifference creates inefficiencies.
For example prisons and courts are incentivized to have as many criminals needing sentencing and punishment as possible. While a host society would benefit if there were less actual crime and less members having to suffer through punishment.
But internal to itself a court system creating lots and lots of meaningless hearings (meaningless in that they are rigged to a known outcome or a random outcome that doesn’t depend on the inputs and thus a waste of everyone’s time) or a prison having lots of people kept barely alive through efficient frugality is correct for these institutions own goals.
I don’t quite fully grasp why world-model divergence is inherently so problematic unless there is some theorem that says robust coordination is only possible with full synchronization. Is there something preventing the possibility of alignment among agents with significantly divergent world models?
I don’t actually expect ontology divergence to be that much of an issue, but at this point ontology divergence is a very poorly understood problem in general, and I think it’s at least plausible that it could be a fundamental barrier to coordination. The story is conditioning on the world where it does turn out to be a major barrier.
It would potentially be problematic for the sorts of reasons sketched out in The Pointers Problem. Roughly speaking, if the pointers problem turns out to be fundamentally intractable, then that means that the things humans want (and probably the things minds want more generally) only make sense at all in a very specific world-model, and won’t really correspond to anything in the ontologies of other minds. That makes it hard to delegate, since other minds have an inherently limited understanding of what we’re even asking for, and we need to exchange a very large amount of information to clarify enough to get good results.
In practice, this would probably look like needing more and more shared background knowledge in order to delegate a task, as the task complexity increases. In order to be a major barrier even for an AI, the scaling would have to be very bad (i.e. amount of shared background increases very rapidly with task complexity), and breaking down complex tasks into simpler tasks would have to be prohibitively expensive (which does seem realistic for complex tasks in practice).
I don’t think this scenario is actually true (see the Natural Abstraction Hypothesis for the opposite), but I do think it’s at least plausible.
Got it. It’s more of an assumption than known to be difficult. Personally, I suspect that it’s not a fundamental barrier given how good humans are at good at chunking concepts into layers of abstraction that can be communicated much more easily than carefully comparing entire models of the world.
Yeah this is one where it seems like as long as the delegator and task engine are both rational (aka manager and worker) it works fine.
The problems show up in 2 ways : when what the organization is itself incentived by is misaligned with the needs of the host society, or when the incomplete bookkeeping at a layer or corruption or indifference creates inefficiencies.
For example prisons and courts are incentivized to have as many criminals needing sentencing and punishment as possible. While a host society would benefit if there were less actual crime and less members having to suffer through punishment.
But internal to itself a court system creating lots and lots of meaningless hearings (meaningless in that they are rigged to a known outcome or a random outcome that doesn’t depend on the inputs and thus a waste of everyone’s time) or a prison having lots of people kept barely alive through efficient frugality is correct for these institutions own goals.