Vladimir_Nesov comments on Richard Ngo’s Shortform

Vladimir_Nesov Jun 29, 2024, 12:59 AM
4 points
0
UDT doesn’t do multistage commitments, it has a single all-powerful “past” version that looks into all possible futures before pronouncing a global policy that all of them would then follow. This policy is not a collection of commitments in a reasonable informal sense, it’s literally all details of behavior of future versions of the agent in response to all possible observations. In case of logical updatelessness, also in response to all possible observations of computational facts. (UDT for the idealized past version defines a single master model, future versions are just passively running inference from the contexts of their particular situations.)

The convergent idea for acausal coordination between systems $A$ and $B$ seems to be constructing a shared subagent $C$ whose instances exist as part of both $A$ and $B$ (after $A$ and $B$ successfully both construct the same $C$ , not before), so that $C$ can then act within them in the style of FDT, though really it’s mostly about $C$ thinking of the effects of its behavior in terms of “I am an algorithm” rather than “I am a physical object”. (For UDT, the shared subagent $C$ is the idealized common past version of its different possible future versions $A$ and $B$ . This assumes that $A$ and $B$ already have a lot in common, so maybe $C$ is instead Buddha.)

A bulk of the blind alleys seem to be about allowing subagents various superpowers, instead of focusing on managing the fallout of making them small and bounded (but possibly more plentiful). I think this is where investigations into logical updatelessness go wrong. It does need solving, but not by considering some fact unknown globally, or even at certain logical times. Instead a fact can remain unknown to some small subagent, and can be observed by it at some point, or computed by another subagent. Values are also knowledge, so sufficiently small subagents shouldn’t even by default know full values of the larger system, and should be prepared to learn more about them. This is a consideration that doesn’t even depend on there initially being multiple big agents with different values.

Another point is that coordination doesn’t necessarily need construction of exactly the same shared subagent, or it doesn’t need to be “exactly the same” in a straightforward sense, which the results on coordination in PD illustrate. The role of subagents in this case is that $A$ can create a subagent $C_{A}$ , while $B$ creates a subagent $C_{B}$ . And even where $A$ and $B$ remain intractable for each other, $C_{A}$ and $C_{B}$ can be much smaller and by construction prepared to coordinate with each other, from within $A$ and $B$ . (It seems natural for the big agents to treat such subagents as something like their copies of an assurance contract, which is signed through commitment to give them influence over the big agent’s thinking or behavior. And letting contracts be agents in their own right gives a lot of flexibility in coordination they can arrange.)
What links here?
- Noosphere89's comment on A shot at the diamond-alignment problem by TurnTrout (Dec 26, 2024, 4:11 PM; 8 points)