sunwillrise comments on Alignment: “Do what I would have wanted you to do”

sunwillrise 13 Jul 2024 19:29 UTC
2 points
2
Hmm, this paragraph maybe points to some linguistic disagreement we have (and my guess is causing confusion in other cases).
I feel like you are treating “coherent” as a binary, when I am treating it more as a sliding scale.
Alright, so, on the one hand, this is definitely helpful for this conversation because it has allowed me to much better understand what you’re saying. On the other hand, the sliding scale of coherence is much more confusing to me than the prior operationalizations of this concept were. I understand, at a mathematical level, what Eliezer (and you) mean by coherence, when viewed as binary. I don’t think the same is true when we have a sliding scale instead. This isn’t your fault, mind you; these are probably supposed to be confusing topics given our current, mostly-inadequate, state of understanding of them, but the ultimate effect is still what it is.
I expect we would still have a great deal of leftover disagreement about where on that scale CEV would take a human when we start extrapolating. I’m also somewhat confident that no effective way of resolving that disagreement is available to us currently.
This gives me some probability we don’t disagree that much, but my sense is you are throwing out the baby with the bathwater in your response to Roger, and that that points to a real disagreement.
Well, yes, I suppose. The standard response to critiques of CEV, from the moment they started appearing, is some version of Stuart Armstrong’s “we don’t need a full solution, just one good enough,” and there probably is some disagreement over what is “enough” and over how well CEV is likely to work.
But there is also another side of this, which I mentioned at the very end of my initial comment that sparked this entire discussion, namely my expectation that overly optimistic ^[1] thinking about the “theoretical soundness and practical viability” of CEV “causes confusions and incorrect thinking more than it dissolves questions and creates knowledge.” I think this is another point of genuine disagreement, which is downstream of not just the object-level questions discussed here but also stuff like the viability of Realism about rationality, overall broad perspectives about rationality and agentic cognition, and other related things. These are broader philosophical issues, and I highly doubt much headway can be made to resolve this dispute through a mere series of comments.
But it seems to me like something like “The Great Reflection” would be extremely valuable
Incidentally, I do agree with this to some extent:
I do not have answers to the very large set of questions I have asked and referenced in this comment. Far more worryingly, I have no real idea of how to even go about answering them or what framework to use or what paradigm to think through. Unfortunately, getting all this right seems very important if we want to get to a great future. Based on my reading of the general pessimism you [Wei Dai] have been signaling throughout your recent posts and comments, it doesn’t seem like you have answers to (or even a great path forward on) these questions either despite your great interest in and effort spent on them, which bodes quite terribly for the rest of us.
Perhaps if a group of really smart philosophy-inclined people who have internalized the lessons of the Sequences without being wedded to the very specific set of conclusions MIRI has reached about what AGI cognition must be like and which seem to be contradicted by the modularity, lack of agentic activity, moderate effectiveness of RLHF etc (overall just the empirical information) coming from recent SOTA models were to be given a ton of funding and access and 10 years to work on this problem as part of a proto-Long Reflection, something interesting would come out. But that is quite a long stretch at this point.
1. ^
  From my perspective, of course.
- dxu 13 Jul 2024 23:03 UTC
  4 points
  0
  Parent
  Can we not speak of apparent coherence relative to a particular standpoint? If a given system seems to be behaving in such a way that you personally can’t see a way to construct for it a Dutch book, a series of interactions with it such that energy/negentropy/resources can be extracted from it and accrue to you, that makes the system inexploitable with respect to you, and therefore at least as coherent as you are. The closer to maximal coherence a given system is, the less it will visibly depart from the appearance of coherent behavior, and hence utility function maximization; the fact that various quibbles can be made about various coherence theorems does not seem to me to negate this conclusion.
  
  Humans are more coherent than mice, and there are activities and processes which individual humans occasionally undergo in order to emerge more coherent than they did going in; in some sense this is the way it has to be, in any universe where (1) the initial conditions don’t start out giving you fully coherent embodied agents, and (2) physics requires continuity of physical processes, so that fully formed coherent embodied agents can’t spring into existence where there previously were none; there must be some pathway from incoherent, inanimate matter from which energy may be freely extracted, to highly organized configurations of matter from which energy may be extracted only with great difficulty, if it can be extracted at all.
  
  If you expect the endpoint of that process to not fully accord with the von Neumann-Morgenstein axioms, because somebody once challenged the completeness axiom, independence axiom, continuity axiom, etc., the question still remains as to whether departures from those axioms will give rise to exploitable holes in the behavior of such systems, from the perspective of much weaker agents such as ourselves. And if the answer is “no”, then it seems to me the search for ways to make a weaker, less coherent agent into a stronger, more coherent agent is well-motivated, and necessary—an appeal to consequences in a certain sense, yes, but one that I endorse!