This deserves a longer answer than I have time to allocate it, but I quickly remark that I don’t recognize the philosophy or paradigm of updatelessness as refusing to learn things or being terrified of information; a rational agent should never end up in that circumstance, unless some perverse other agent is specifically punishing them for having learned the information (and will lose of their own value thereby; it shouldn’t be possible for them to gain value by behaving “perversely” in that way, for then of course it’s not “perverse”). Updatelessness is, indeed, exactly that sort of thinking which prevents you from being harmed by information, because your updateless exposure to information doesn’t cause you to lose coordination with your counterfactual other selves or exhibit dynamic inconsistency with your past self.
From an updateless standpoint, “learning” is just the process of reacting to new information the way your past self would want you to do in that branch of possibility-space; you should never need to remain ignorant of anything. Maybe that involves not doing the thing that would then be optimal when considering only the branch of reality you turned out to be inside, but the updateless mind denies that this was ever the principle of rational choice, and so feels no need to stay ignorant in order to maintain dynamic consistency.
I completely agree with your point: an agent being updateless doesn’t mean it won’t learn new information. In fact, it might perfectly decide to “make my future action A depend on future information X”, if the updateless prior so finds it optimal. While in other situations, when the updateless prior deems it net-negative (maybe due to other agents exploiting this future dependence), it won’t.
This point is already observed in the post (see e.g. footnote 4), although without going deep into it, due to the post being meant for the layman (it is more deeply addressed, for example, in section 4.4 of my report). Also for illustrative purposes, in two places I have (maybe unfairly) caricaturized an updateless agent as being “scared” of learning more information. While really, what this means (as hopefully clear from earlier parts of the post) is “the updateless prior assessed whether it seemed net-positive to let future actions depend on future information, and decided no (for almost all actions)”.
The problem I present is not “being scared of information”, but the trade-off between “letting your future action depend on future information X” vs “not doing so” (and, in more detail, how exactly it should depend on such information). More dependence allows you to correctly best-respond in some situations, but also could sometimes get you exploited. The problem is there’s no universal (belief-independent) rule to assess when to allow for dependence: different updateless priors will decide differently. And need to do so in advance of letting their deliberation depend on their interactions (they still don’t know if that’s net-positive). Due to this prior-dependence, if different updateless agents have different beliefs, they might play very different policies, and miscoordinate. This is also analogous to different agents demanding different notions of fairness (more here). I have read no convincing arguments as to why most superintelligences will converge on beliefs (or notions of fairness) that successfully coordinate on Pareto optimality (especially in the face of the problem of trapped priors i.e. commitment races), and would be grateful if you could point me in their direction.
I interpret you as expressing a strong normative intuition in favor of ex ante optimization. I share this primitive intuition, and indeed it remains true that, if you have some prior and simply want to maximize its EV, updatelessness is exactly what you need. But I think we have discovered other pro tanto reasons against updatelessness, like updateless agents probably performing worse on average (in complex environments) due to trapped priors and increased miscoordination.
refusing to learn things or being terrified of information; a ratio
repeated Prisoners dilemma policy : “I will ALWAYS cooperate if I believe the other party is a copy of myself”.
This policy cannot change if it observes the counterparty always defecting. It’s “sibling” must need the resources. (If the agent can reason about why it’s always being betrayed)
That’s probably where this breaks, in implementation. A mutant agent that is a cracked version of the original one could exploit a network of updateless cooperators. Theoretically “in group” members could be unable to modify themselves without outside help though. (Their policy wouldn’t contain a possible input case that permits any changes)
This policy cannot change if it observes the counterparty always defecting
If A observes the other party B defecting even once after verifying the other party B believed A to be a copy of B (assuming sufficient scanning tech to read each others’ minds reliably, eg for simplicity this could be on the same computer in an open source game theory test environment), then A can reliably and therefore must always conclude B is not actually a copy, but a copy with some modification (such as random noise) that induces defection.
then A can reliably and therefore must always conclude B is not actually a copy, but a copy with some modification
What you describe can work, though now the policy is more complicated. It now has conditions where you renege when there is a certain level of confidence that the counterparty isn’t a cloned peer.
Obviously “will ALWAYS cooperate” you know the other party isn’t a peer the instant they defect. This policy collapses to grim trigger and actually you didn’t need the peer detection.
In more complex and interesting environments there’s now a “defection margin”. Since you know the exact threshold the other parties will decide you aren’t a peer at, you can exploit them so long as you don’t provide sufficient evidence that you are an outlaw*. (In this case, outlaw means “not an identical clone”)
A lot of these updateless cooperation scenarios are asynchronous, past/future, separated by distance and firewalls. Lots of opportunity to defect and not be punished.
Real life example: shoplift $1 less than the felony threshold. Where a felony conviction is “grim trigger”, society will always defect against you from then on.
This deserves a longer answer than I have time to allocate it, but I quickly remark that I don’t recognize the philosophy or paradigm of updatelessness as refusing to learn things or being terrified of information; a rational agent should never end up in that circumstance, unless some perverse other agent is specifically punishing them for having learned the information (and will lose of their own value thereby; it shouldn’t be possible for them to gain value by behaving “perversely” in that way, for then of course it’s not “perverse”). Updatelessness is, indeed, exactly that sort of thinking which prevents you from being harmed by information, because your updateless exposure to information doesn’t cause you to lose coordination with your counterfactual other selves or exhibit dynamic inconsistency with your past self.
From an updateless standpoint, “learning” is just the process of reacting to new information the way your past self would want you to do in that branch of possibility-space; you should never need to remain ignorant of anything. Maybe that involves not doing the thing that would then be optimal when considering only the branch of reality you turned out to be inside, but the updateless mind denies that this was ever the principle of rational choice, and so feels no need to stay ignorant in order to maintain dynamic consistency.
Thank you for engaging, Eliezer.
I completely agree with your point: an agent being updateless doesn’t mean it won’t learn new information. In fact, it might perfectly decide to “make my future action A depend on future information X”, if the updateless prior so finds it optimal. While in other situations, when the updateless prior deems it net-negative (maybe due to other agents exploiting this future dependence), it won’t.
This point is already observed in the post (see e.g. footnote 4), although without going deep into it, due to the post being meant for the layman (it is more deeply addressed, for example, in section 4.4 of my report). Also for illustrative purposes, in two places I have (maybe unfairly) caricaturized an updateless agent as being “scared” of learning more information. While really, what this means (as hopefully clear from earlier parts of the post) is “the updateless prior assessed whether it seemed net-positive to let future actions depend on future information, and decided no (for almost all actions)”.
The problem I present is not “being scared of information”, but the trade-off between “letting your future action depend on future information X” vs “not doing so” (and, in more detail, how exactly it should depend on such information). More dependence allows you to correctly best-respond in some situations, but also could sometimes get you exploited. The problem is there’s no universal (belief-independent) rule to assess when to allow for dependence: different updateless priors will decide differently. And need to do so in advance of letting their deliberation depend on their interactions (they still don’t know if that’s net-positive).
Due to this prior-dependence, if different updateless agents have different beliefs, they might play very different policies, and miscoordinate. This is also analogous to different agents demanding different notions of fairness (more here). I have read no convincing arguments as to why most superintelligences will converge on beliefs (or notions of fairness) that successfully coordinate on Pareto optimality (especially in the face of the problem of trapped priors i.e. commitment races), and would be grateful if you could point me in their direction.
I interpret you as expressing a strong normative intuition in favor of ex ante optimization. I share this primitive intuition, and indeed it remains true that, if you have some prior and simply want to maximize its EV, updatelessness is exactly what you need. But I think we have discovered other pro tanto reasons against updatelessness, like updateless agents probably performing worse on average (in complex environments) due to trapped priors and increased miscoordination.
repeated Prisoners dilemma policy : “I will ALWAYS cooperate if I believe the other party is a copy of myself”.
This policy cannot change if it observes the counterparty always defecting. It’s “sibling” must need the resources. (If the agent can reason about why it’s always being betrayed)
That’s probably where this breaks, in implementation. A mutant agent that is a cracked version of the original one could exploit a network of updateless cooperators. Theoretically “in group” members could be unable to modify themselves without outside help though. (Their policy wouldn’t contain a possible input case that permits any changes)
Real life example: mosquito gene drives.
If A observes the other party B defecting even once after verifying the other party B believed A to be a copy of B (assuming sufficient scanning tech to read each others’ minds reliably, eg for simplicity this could be on the same computer in an open source game theory test environment), then A can reliably and therefore must always conclude B is not actually a copy, but a copy with some modification (such as random noise) that induces defection.
What you describe can work, though now the policy is more complicated. It now has conditions where you renege when there is a certain level of confidence that the counterparty isn’t a cloned peer.
Obviously “will ALWAYS cooperate” you know the other party isn’t a peer the instant they defect. This policy collapses to grim trigger and actually you didn’t need the peer detection.
In more complex and interesting environments there’s now a “defection margin”. Since you know the exact threshold the other parties will decide you aren’t a peer at, you can exploit them so long as you don’t provide sufficient evidence that you are an outlaw*. (In this case, outlaw means “not an identical clone”)
A lot of these updateless cooperation scenarios are asynchronous, past/future, separated by distance and firewalls. Lots of opportunity to defect and not be punished.
Real life example: shoplift $1 less than the felony threshold. Where a felony conviction is “grim trigger”, society will always defect against you from then on.