Building a Son is just committing to a whole policy for the future. In the formalism where our agent uses probability distributions, and ex interim expected value maximization decides your action… the only way to ensure dynamic stability (for your Son to be identical to you) is to be completely Updateless. That is, to decide something using your current prior, and keep that forever.
Luckily, real agents don’t seem to work like that. We are more of an ensemble of selected-for heuristics, and it seems true scope-sensitive complete Updatelessnes is very unlikely to come out of this process (although we do have local versions of non-true Updatelessness, like retributivism in humans). In fact, it’s not even exactly clear how I would use my current brain-state could decide something for the whole future. It’s not even well-defined, like when you’re playing a board-game and discover some move you were planning isn’t allowed by the rules. There are ways to actually give an exhaustive definition, but I suspect the ones that most people would intuitively like (when scrutinized) are sneaking in parts of Updatefulness (which I think is the correct move).
More formally, it seems like what real-world agents do is much better-represented by what I call “Slow-learning Policy Selection”. (Abram had a great post about this called “Policy Selection Solves Most Problems”, which I can’t find now.) This is a small agent (short computation time) recommending policies for a big agent to follow in the far future. But the difference with complete Updatelessness is that the small agent also learns (much more slowly than the big one). Thus, if the small agent thinks a policy (like paying up in Counterfactual Mugging) is the right thing to do, the big agent will implement this for a pretty long time. But eventually the small agent might change its mind, and start recommending a different policy. I basically think that all problems not solved by this are unsolvable in principle, due to the unavoidable trade-off between updating and not updating.[1]
This also has consequences for how we expect superintelligences to be. If by them having “vague opinions about the future” we mean a wide, but perfectly rigorous and compartmentalized probability distribution over literally everything that might happen, then yes, the way to maximize EV according to that distribution might be some very concrete, very risky move, like re-writing to an algorithm because you think simulators will reward this, even if you’re not sure how well that algorithm performs in this universe. But that’s not how abstractions or uncertainty work mechanistically! Abstractions help us efficiently navigate the world thanks to their modular, nested, fuzzy structure. If they had to compartmentalize everything in a rigorous and well-defined way, they’d stop working. When you take into account how abstractions really work, the kind of partial updatefulness we see in the world is what we’d expect. I might write about this soon.
Surprisingly, in some conversations others still wanted to “get both updatelessness and updatefulness at the same time”. Or, receive the gains from Value of Information, and also those from Strategic Updatelessness. Which is what Abram and I had in mind when starting work. And is, when you understand what these words really mean, impossible by definition.
Here’s Abram’s post. It discusses a more technical setting, but essentially this fits the story of choosing how to channel behavior/results of some other algorithm/contract, without making use of those results when making the choice for how to use them eventually (that is, the choice of a policy for responding to facts is in the logical past from those facts, and so can be used by those facts). Drescher’s ASP example more clearly illustrates the problem of making the contract’s consequentialist reasoning easier, in this case the contract is the predictor and its behavior is stipulated to be available to the agent (and so easily diagonalized). The agent must specifically avoid making use of knowledge of the contract’s behavior when deciding how to respond to that behavior. This doesn’t necessarily mean that the agent doesn’t have the knowledge, as long as it doesn’t use it for this particular decision about policy for what to do in response to the knowledge. In fact the agent could use the knowledge immediately after choosing the policy, by applying the policy to the knowledge, which turns ASP into Transparent Newcomb. A big agent wants to do small agent reasoning in order for that reasoning to be legible to those interested in its results.
So it’s not so much a tradeoff between updating and not updating, it’s instead staged computation of updating (on others’ behavior) that makes your own reasoning more legible to others that you want to be able to coordinate with you. If some facts you make use of vary with other’s will, you want the dependence to remain simple to the other’s mind (so that the other may ask what happens with those facts depending on what they do), which in practice might take the form of delaying the updating. The problem with updateful reasoning that destroys strategicness seems to be different though, an updateful agent just stops listening to UDT policy, so there is no dependence of updateful agent’s actions on the shared UDT policy that coordinates all instances of the agent, this dependence is broken (or never established) rather than merely being too difficult to see for the coordinating agent (by being too far in the logical future).
Brain-dump on Updatelessness and real agents
Building a Son is just committing to a whole policy for the future. In the formalism where our agent uses probability distributions, and ex interim expected value maximization decides your action… the only way to ensure dynamic stability (for your Son to be identical to you) is to be completely Updateless. That is, to decide something using your current prior, and keep that forever.
Luckily, real agents don’t seem to work like that. We are more of an ensemble of selected-for heuristics, and it seems true scope-sensitive complete Updatelessnes is very unlikely to come out of this process (although we do have local versions of non-true Updatelessness, like retributivism in humans).
In fact, it’s not even exactly clear how I would use my current brain-state could decide something for the whole future. It’s not even well-defined, like when you’re playing a board-game and discover some move you were planning isn’t allowed by the rules. There are ways to actually give an exhaustive definition, but I suspect the ones that most people would intuitively like (when scrutinized) are sneaking in parts of Updatefulness (which I think is the correct move).
More formally, it seems like what real-world agents do is much better-represented by what I call “Slow-learning Policy Selection”. (Abram had a great post about this called “Policy Selection Solves Most Problems”, which I can’t find now.) This is a small agent (short computation time) recommending policies for a big agent to follow in the far future. But the difference with complete Updatelessness is that the small agent also learns (much more slowly than the big one). Thus, if the small agent thinks a policy (like paying up in Counterfactual Mugging) is the right thing to do, the big agent will implement this for a pretty long time. But eventually the small agent might change its mind, and start recommending a different policy. I basically think that all problems not solved by this are unsolvable in principle, due to the unavoidable trade-off between updating and not updating.[1]
This also has consequences for how we expect superintelligences to be. If by them having “vague opinions about the future” we mean a wide, but perfectly rigorous and compartmentalized probability distribution over literally everything that might happen, then yes, the way to maximize EV according to that distribution might be some very concrete, very risky move, like re-writing to an algorithm because you think simulators will reward this, even if you’re not sure how well that algorithm performs in this universe.
But that’s not how abstractions or uncertainty work mechanistically! Abstractions help us efficiently navigate the world thanks to their modular, nested, fuzzy structure. If they had to compartmentalize everything in a rigorous and well-defined way, they’d stop working. When you take into account how abstractions really work, the kind of partial updatefulness we see in the world is what we’d expect. I might write about this soon.
Surprisingly, in some conversations others still wanted to “get both updatelessness and updatefulness at the same time”. Or, receive the gains from Value of Information, and also those from Strategic Updatelessness. Which is what Abram and I had in mind when starting work. And is, when you understand what these words really mean, impossible by definition.
Here’s Abram’s post. It discusses a more technical setting, but essentially this fits the story of choosing how to channel behavior/results of some other algorithm/contract, without making use of those results when making the choice for how to use them eventually (that is, the choice of a policy for responding to facts is in the logical past from those facts, and so can be used by those facts). Drescher’s ASP example more clearly illustrates the problem of making the contract’s consequentialist reasoning easier, in this case the contract is the predictor and its behavior is stipulated to be available to the agent (and so easily diagonalized). The agent must specifically avoid making use of knowledge of the contract’s behavior when deciding how to respond to that behavior. This doesn’t necessarily mean that the agent doesn’t have the knowledge, as long as it doesn’t use it for this particular decision about policy for what to do in response to the knowledge. In fact the agent could use the knowledge immediately after choosing the policy, by applying the policy to the knowledge, which turns ASP into Transparent Newcomb. A big agent wants to do small agent reasoning in order for that reasoning to be legible to those interested in its results.
So it’s not so much a tradeoff between updating and not updating, it’s instead staged computation of updating (on others’ behavior) that makes your own reasoning more legible to others that you want to be able to coordinate with you. If some facts you make use of vary with other’s will, you want the dependence to remain simple to the other’s mind (so that the other may ask what happens with those facts depending on what they do), which in practice might take the form of delaying the updating. The problem with updateful reasoning that destroys strategicness seems to be different though, an updateful agent just stops listening to UDT policy, so there is no dependence of updateful agent’s actions on the shared UDT policy that coordinates all instances of the agent, this dependence is broken (or never established) rather than merely being too difficult to see for the coordinating agent (by being too far in the logical future).