My current guess is that the problem with usual framings of bargaining is that they look at individual episodes of what should instead be a whole space of situations where acausal coordination is performed by shared adjudicators, that act across large subsets of that space of episodes in a UDT-like manner, bargaining primarily between the episodes rather than within the episodes. These adjudicators could exist at different levels of sophistication, they could be vague concepts, precise word-wielding concepts, hypotheses/theories, norms, paradigms/ideologies, people/agents, bureaucracies.
Decisions of adjudicators express aspects of preference. Collections of adjudicators form larger agents, this way concepts become people, people become civilizations, and civilizations become CEV, in the limit of this process. Adjudicators are shared among the larger agents that channel them, thus giving them comparable preferences and coordinating their smaller less tractable within-episode bargains. Instead of utility functions expressing consistent preference, adjudicators make actions of agents more coherent directly, by bargaining among the episodes they influence, putting the episodes in comparison by being present in all of them.
This offers a process of extending the goodhart scope of an agent (the possibly off-distribution situations where it still acts robustly, aligned with intent of the training distribution). Namely, policy of an agent should be split into instances of local behavior, reframed as within-episodes small bargains struck by smaller adjudicators that acausally bind large collections of different episodes together. Behavior and scope of those adjudicators should reach an equilibrium where feedback from learning on the episodes (reflection) no longer changes them. Finally, new adjudicators can be introduced into old episodes to extend the goodhart scope of the larger agent.
My current guess is that the problem with usual framings of bargaining is that they look at individual episodes of what should instead be a whole space of situations where acausal coordination is performed by shared adjudicators, that act across large subsets of that space of episodes in a UDT-like manner, bargaining primarily between the episodes rather than within the episodes. These adjudicators could exist at different levels of sophistication, they could be vague concepts, precise word-wielding concepts, hypotheses/theories, norms, paradigms/ideologies, people/agents, bureaucracies.
Decisions of adjudicators express aspects of preference. Collections of adjudicators form larger agents, this way concepts become people, people become civilizations, and civilizations become CEV, in the limit of this process. Adjudicators are shared among the larger agents that channel them, thus giving them comparable preferences and coordinating their smaller less tractable within-episode bargains. Instead of utility functions expressing consistent preference, adjudicators make actions of agents more coherent directly, by bargaining among the episodes they influence, putting the episodes in comparison by being present in all of them.
This offers a process of extending the goodhart scope of an agent (the possibly off-distribution situations where it still acts robustly, aligned with intent of the training distribution). Namely, policy of an agent should be split into instances of local behavior, reframed as within-episodes small bargains struck by smaller adjudicators that acausally bind large collections of different episodes together. Behavior and scope of those adjudicators should reach an equilibrium where feedback from learning on the episodes (reflection) no longer changes them. Finally, new adjudicators can be introduced into old episodes to extend the goodhart scope of the larger agent.