Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 12 Aug 2024 21:40 UTC
16 points
0
(Vague, speculative thinking): Is the time element of UDT actually a distraction? Consider the following: agents A and B are in a situation where they’d benefit from cooperation. Unfortunately, the situation is complicated—it’s not like a prisoner’s dilemma, where there’s a clear “cooperate” and a clear “defect” option. Instead they need to take long sequences of actions, and they each have many opportunities to subtly gain an advantage at the other’s expense.
Therefore instead of agreements formulated as “if you do X I’ll do Y”, it’d be far more beneficial for them to make agreements of the form “if you follow the advice of person Z then I will too”. Here person Z needs to be someone that both A and B trust to be highly moral, neutral, competent, etc. Even if there’s some method of defecting that neither of them considered in advance, at the point in time when it arises Z will advise against doing it. (They don’t need to actually have access to Z, they can just model what Z will say.)
If A and B don’t have much communication bandwidth between them (e.g. they’re trying to do acausal coordination) then they will need to choose a Z that’s a clear Schelling point, even if that Z is suboptimal in other ways.
UDT can be seen as the special case where A and B choose Z as follows: “keep forgetting information until you don’t know if you’re A or B”. If A and B are different branches of the same agent, then the easiest way to do this is just to let Z be their last common ancestor. (Coalitional agency can be seen as an implementation of this.) If they’re not, then they’ll also need to coordinate on a way to make sure they’ll forget roughly the same things.
But there are many other ways of picking Schelling Zs. For example, if A and B follow the same religion, then the central figure in that religion (Jesus, Buddha, Mohammad, etc) is a clear Schelling point.
EDIT: Z need not be one person, it could be a group of people. E.g. in the UDT case, if there are several different orders in which A and B could potentially forget information, then they could just do all of them and then follow the aggregated advice of the resulting council. Similarly, even if A and B aren’t of the same religion, they could agree to follow whatever compromise their respective religions’ central figures would have come to.
EDIT 2: UDT is usually prone to commitment races because it thinks of each agent in a conflict as separately making commitments earlier in logical time. But focusing on symmetric commitments gets rid of this problem.
- Vladimir_Nesov 13 Aug 2024 19:15 UTC
  2 points
  0
  Parent
  Most discussion of updatelessness suggests that Z is an agent similar to A and B, and also that it’s a policy whose implications are transparent to A and B. I think an importaint case has Z quite unlike A or B, possibly much smaller and more legible than them. And it can still be an agent in its own right, capable of eventually growing stronger than A or B were at the time Z was initially formulated. By a growing/developing Z I mean something that exists in coordination through both of these places, rather than splitting into a version of Z at A, and a version of Z at B, losing touch with each other.
  
  Such a Z might be thought of as an environmental agent that A and B create near them, equipped to keep in contact with its alternative instance (knowing enough about both its instance near A and its instance near B), rather than specifically a commitment of A or B, or a replacement of A or B, or a result of merging A and B. The commitment of A and B is then to future interactions with Z, which the updateless/coordinated core of Z should be sufficiently aware of to plan for.
- Martín Soto 13 Aug 2024 19:05 UTC
  1 point
  0
  Parent
  I think Nesov had some similar idea about “agents deferring to a (logically) far-away algorithm-contract Z to avoid miscoordination”, although I never understood it completely, nor think that idea can solve miscoordination in the abstract (only, possibly, be a nice pragmatic way to bootstrap coordination from agents who are already sufficiently nice).
  EDIT 2: UDT is usually prone to commitment races because it thinks of each agent in a conflict as separately making commitments earlier in logical time. But focusing on symmetric commitments gets rid of this problem.
  Hate to always be that guy, but if you are assuming all agents will only engage in symmetric commitments, then you are assuming commitment races away. In actuality, it is possible for a (meta-) commitment race to happen about “whether I only engage in symmetric commitments”.
  - Vladimir_Nesov 13 Aug 2024 20:07 UTC
    2 points
    0
    Parent
    
    nor think that idea can solve miscoordination in the abstract (only, possibly, be a nice pragmatic way to bootstrap coordination from agents who are already sufficiently nice).
    
    The central question to my mind is principles of establishing coordination between different situations/agents, and contracts is a framing for what coordination might look like once established. Agentic contracts have the additional benefit of maintaining coordination across their instances once it’s established initially. Coordination theory should clarify how agents should think about establishing coordination with each other, how they should construct these contracts.
    
    This is not about niceness/cooperation. For example I think it should be possible to understand a transformer as being in coordination with the world through patterns in the world and circuits in the transformer, so that coordination gets established through learning. Beliefs are contracts between a mind and its object of study, essential tools the mind has for controlling it. Consequentialist control is a special case of coordination in this sense, and I think one problem with decision theories is that they are usually overly concerned with remaining close to consequentialist framing.