The monotonicity principle requires it to be non-decreasing w.r.t. the manifesting of less facts. Roughly speaking, the more computations the universe runs, the better.
I think this is what I was missing. Thanks.
So, then, the monotonicity principle sets a baseline for the agent’s loss function that corresponds to how much less stuff can happen to whatever subset of the universe it cares about, getting worse the fewer opportunities become available, due to death or some other kind of stifling. Then the agent’s particular value function over universe-states gets added/subtracted on top of that, correct?
No, it’s not a baseline, it’s just an inequality. Let’s do a simple example. Suppose the agent is selfish and cares only about (i) the experience of being in a red room and (ii) the experience of being in a green room. And, let’s suppose these are the only two possible experiences, it can’t experience going from a room in one color to a room in another color or anything like that (for example, because the agent has no memory). Denote G the program corresponding to “the agent deciding on an action after it sees a green room” and R the program corresponding to “the agent deciding on an action after it sees a red room”. Then, roughly speaking[1], there are 4 possibilities:
α∅: The universe runs neither R nor G.
αR: The universe runs R but not G.
αG: The universe runs G but not R.
αRG: The universe runs both R and G.
In this case, the monotonicity principle imposes the following inequalities on the loss function L:
L(α∅)≥L(αR)L(α∅)≥L(αG)L(αR)≥L(αRG)L(αG)≥L(αRG)
That is, α∅ must be the worst case and αRG must be the best case.
In fact, manifesting of computational facts doesn’t amount to selecting a set of realized programs, because programs can be entangled with each other, but let’s ignore this for simplicity’s sake.
Okay, so it’s just a constraint on the final shape of the loss function. Would you construct such a loss function by integrating a strictly non-positive computation-value function over all of space and time (or at least over the future light-cones of all its copies, if it focuses just on the effects of its own behavior)?
Space and time are not really the right parameters here, since these refer to Φ (physical states), not Γ (computational “states”) or 2Γ (physically manifest facts about computations). In the example above, it doesn’t matter where the (copy of the) agent is when it sees the red room, only the fact the agent does see it. We could construct such a loss function by a sum over programs, but the constructions suggested in section 3 use minimum instead of sum, since this seems like a less “extreme” choice in some sense. Ofc ultimately the loss function is subjective: as long as the monotonicity principle is obeyed, the agent is free to have any loss function.
I think this is what I was missing. Thanks.
So, then, the monotonicity principle sets a baseline for the agent’s loss function that corresponds to how much less stuff can happen to whatever subset of the universe it cares about, getting worse the fewer opportunities become available, due to death or some other kind of stifling. Then the agent’s particular value function over universe-states gets added/subtracted on top of that, correct?
No, it’s not a baseline, it’s just an inequality. Let’s do a simple example. Suppose the agent is selfish and cares only about (i) the experience of being in a red room and (ii) the experience of being in a green room. And, let’s suppose these are the only two possible experiences, it can’t experience going from a room in one color to a room in another color or anything like that (for example, because the agent has no memory). Denote G the program corresponding to “the agent deciding on an action after it sees a green room” and R the program corresponding to “the agent deciding on an action after it sees a red room”. Then, roughly speaking[1], there are 4 possibilities:
α∅: The universe runs neither R nor G.
αR: The universe runs R but not G.
αG: The universe runs G but not R.
αRG: The universe runs both R and G.
In this case, the monotonicity principle imposes the following inequalities on the loss function L:
L(α∅)≥L(αR) L(α∅)≥L(αG) L(αR)≥L(αRG) L(αG)≥L(αRG)
That is, α∅ must be the worst case and αRG must be the best case.
In fact, manifesting of computational facts doesn’t amount to selecting a set of realized programs, because programs can be entangled with each other, but let’s ignore this for simplicity’s sake.
Okay, so it’s just a constraint on the final shape of the loss function. Would you construct such a loss function by integrating a strictly non-positive computation-value function over all of space and time (or at least over the future light-cones of all its copies, if it focuses just on the effects of its own behavior)?
Space and time are not really the right parameters here, since these refer to Φ (physical states), not Γ (computational “states”) or 2Γ (physically manifest facts about computations). In the example above, it doesn’t matter where the (copy of the) agent is when it sees the red room, only the fact the agent does see it. We could construct such a loss function by a sum over programs, but the constructions suggested in section 3 use minimum instead of sum, since this seems like a less “extreme” choice in some sense. Ofc ultimately the loss function is subjective: as long as the monotonicity principle is obeyed, the agent is free to have any loss function.