Updatelessness sure seems nice from a theoretical perspective, but it has a ton of problems that go beyond what you just mentioned and which seem to me to basically doom the entire enterprise (at least with regards to what we are currently discussing, namely people):
I am not aware of any method of operationalizing even a weak version of updatelessness in the context of cognitively limited human beings that do not have access to their own source code
I am pretty sure that a large portion of my values (and, by extension, the values of the vast majority of people) are indexical in nature, at least partly because my access to the outside world is mediated through sense data, which my S1 seems to value “terminally” and not as a mere proxy for preferences over current world-states. Indexicality seems to me to play very poorly with updatelesness (although I suspect you would know more about this than me, given your work in this area?)
I don’t currently know of a way that humans can remain updateless even under (what seems to be like an inordinately optimistic) world in which we can actually access the “source code” by figuring out how to model the abstract classical computation performed by a particular (and reified) subset of the brain’s electronic circuit, basically because of the reasons I gave in my comment to Wei Dai that I referenced earlier (“The feedback loops implicit in the structure of the brain cause reward and punishment signals to “release chemicals that induce the brain to rearrange itself” in a manner closely analogous to and clearly reminiscent of a continuous and (until death) never-ending micro-scale brain surgery. To be sure, barring serious brain trauma, these are typically small-scale changes, but they nevertheless fundamentally modify the connections in the brain and thus the computation it would produce in something like an emulated state (as a straightforward corollary, how would an em that does not “update” its brain chemistry the same way that a biological being does be “human” in any decision-relevant way?”)
I have a much broader skepticism about whether the concepts of “beliefs” and “values” make sense as distinct, coherent concepts that carve reality at the joints, and which I think is reflected in some of the other points I made in my long list of questions and confusions about these matters. It doesn’t really seem to me like updatelessness solves this, or even necessarily offers a concrete path forward on it.
Of course, I don’t expect that you are trying to literally say that going updateless gets rid of all the issues, but rather that thinking about it in those terms, after internalizing that perspective, helps put us in the right frame of mind to make progress on these philosophical and metaphilosophical matters moving forward. But, as I said at the end of my comment to Wei Dai:
I do not have answers to the very large set of questions I have asked and referenced in this comment. Far more worryingly, I have no real idea of how to even go about answering them or what framework to use or what paradigm to think through. Unfortunately, getting all this right seems very important if we want to get to a great future. Based on my reading of the general pessimism you have been signaling throughout your recent posts and comments, it doesn’t seem like you have answers to (or even a great path forward on) these questions either despite your great interest in and effort spent on them, which bodes quite terribly for the rest of us.
Perhaps if a group of really smart philosophy-inclined people who have internalized the lessons of the Sequences without being wedded to the very specific setof conclusions MIRI has reached about what AGI cognition must be like and which seem to be contradicted by the modularity, lack of agentic activity, moderate effectiveness of RLHF etc (overall just the empirical information) coming from recent SOTA models were to be given a ton of funding and access and 10 years to work on this problem as part of a proto-Long Reflection, something interesting would come out. But that is quite a long stretch at this point.
Making maps is practical even when they are not as precise as the whole territory. The point is, path dependence happens in some space of possibilities, and it’s possible to make maps of that whole space and to make use of them to navigate the possibilities jointly, as opposed to getting caught in any one of them. This doesn’t need to involve global coherence across all possibilities (of moral reflection, in this case), just as optimization of the world doesn’t need to involve steamrolling it into repetition of some perfect pattern. But some parts will have similarities and shared issues with other parts, and can inform each other in their development.
Updatelessness closer to something practical is consulting an external map of possibilities that gives advice on acting in the current situation and explains how following its advice influences the possibilities (in their further development that results from following the advice). That is, you don’t need to yourself “be updateless”, the essential observation is that a single computation can exist in many possible situations, and by being the same thing its evaluation will give the same results in all these situations, coordinating what happens in them (without the use of causal influence of some physical thing). This computation doesn’t need to be the whole agent, for example a calculator on Mars computes the same results as a calculator (of a different make) on Earth, and both implementing the same computation thus coordinate what happens on Mars with what happens on Earth without a need to physically communicate. This becomes a matter of decision theory when the coordinating computation is itself an agent. But it doesn’t need to be the same agent as a user of this decision theory as a whole, it doesn’t need to be something like a human, it can be much smaller and more legible, more like a calculator.
Updatelessness sure seems nice from a theoretical perspective, but it has a ton of problems that go beyond what you just mentioned and which seem to me to basically doom the entire enterprise (at least with regards to what we are currently discussing, namely people):
I am not aware of any method of operationalizing even a weak version of updatelessness in the context of cognitively limited human beings that do not have access to their own source code
I am pretty sure that a large portion of my values (and, by extension, the values of the vast majority of people) are indexical in nature, at least partly because my access to the outside world is mediated through sense data, which my S1 seems to value “terminally” and not as a mere proxy for preferences over current world-states. Indexicality seems to me to play very poorly with updatelesness (although I suspect you would know more about this than me, given your work in this area?)
I don’t currently know of a way that humans can remain updateless even under (what seems to be like an inordinately optimistic) world in which we can actually access the “source code” by figuring out how to model the abstract classical computation performed by a particular (and reified) subset of the brain’s electronic circuit, basically because of the reasons I gave in my comment to Wei Dai that I referenced earlier (“The feedback loops implicit in the structure of the brain cause reward and punishment signals to “release chemicals that induce the brain to rearrange itself” in a manner closely analogous to and clearly reminiscent of a continuous and (until death) never-ending micro-scale brain surgery. To be sure, barring serious brain trauma, these are typically small-scale changes, but they nevertheless fundamentally modify the connections in the brain and thus the computation it would produce in something like an emulated state (as a straightforward corollary, how would an em that does not “update” its brain chemistry the same way that a biological being does be “human” in any decision-relevant way?”)
I have a much broader skepticism about whether the concepts of “beliefs” and “values” make sense as distinct, coherent concepts that carve reality at the joints, and which I think is reflected in some of the other points I made in my long list of questions and confusions about these matters. It doesn’t really seem to me like updatelessness solves this, or even necessarily offers a concrete path forward on it.
Of course, I don’t expect that you are trying to literally say that going updateless gets rid of all the issues, but rather that thinking about it in those terms, after internalizing that perspective, helps put us in the right frame of mind to make progress on these philosophical and metaphilosophical matters moving forward. But, as I said at the end of my comment to Wei Dai:
Making maps is practical even when they are not as precise as the whole territory. The point is, path dependence happens in some space of possibilities, and it’s possible to make maps of that whole space and to make use of them to navigate the possibilities jointly, as opposed to getting caught in any one of them. This doesn’t need to involve global coherence across all possibilities (of moral reflection, in this case), just as optimization of the world doesn’t need to involve steamrolling it into repetition of some perfect pattern. But some parts will have similarities and shared issues with other parts, and can inform each other in their development.
Updatelessness closer to something practical is consulting an external map of possibilities that gives advice on acting in the current situation and explains how following its advice influences the possibilities (in their further development that results from following the advice). That is, you don’t need to yourself “be updateless”, the essential observation is that a single computation can exist in many possible situations, and by being the same thing its evaluation will give the same results in all these situations, coordinating what happens in them (without the use of causal influence of some physical thing). This computation doesn’t need to be the whole agent, for example a calculator on Mars computes the same results as a calculator (of a different make) on Earth, and both implementing the same computation thus coordinate what happens on Mars with what happens on Earth without a need to physically communicate. This becomes a matter of decision theory when the coordinating computation is itself an agent. But it doesn’t need to be the same agent as a user of this decision theory as a whole, it doesn’t need to be something like a human, it can be much smaller and more legible, more like a calculator.