Will_Newsome comments on Tendencies in reflective equilibrium

Will_Newsome 20 Jul 2011 20:07 UTC
0 points

maybe only as a tendency to use certain considerations in making decisions.

Qualitatively speaking it might be worth making this distinction, but algorithmically speaking—from a superintelligence’s perspective, say, or a decision theory researcher’s—I can’t see any good reasons why there would be discrete changes or fundamental conceptual differences between the levels of abstraction.

This lack of rigid partitions might also be desirable e.g. if 95% of your decision algorithm suddenly gets erased and you want to infer as much as possible from the remaining 5%; not only the lost “utility function” terms but also the meta-level implicit patterns as well as the highest level implicit decision theoretic policy, ideally using each of those as information to reconstruct the others even in the event of their complete annihilation. (You’d have to do this anyway if all you had left was some fragment of a time-stamped UDT policy lookup table (branch table?).)

ETA: To motivate even thinking about the problem of corrupted hardware a little more, imagine that an agent is running XDT and is trying to make sense of humans’ (or humanity’s, humanity’s ancestors, God’s, consciousnesses-trapped-in-rocks’s)… we’ll call them ‘decision policy-ish-like thingies’, but the “creator”-bound XDT agent only has partial information about any of many of its potential creators for any of many plausible reasons.

Also there is the more philosophical motivation of re-thinking the ‘reasoning’ that was done by the environment/universe in the process of creating our values—genetic/memetic evolution, atmospheric accidents, falling into “wrong”-in-hindsight attractors generally, and basically all causal chains or logical (teleological) properties of the universe that “resulted” at least partially in humans having their “current” values. Thinking things through from first principles, taking the idea of avoiding lost purposes to its logical conclusion (or non-conclusion)---not just searching for causal validity and not accepting even as an “initial dynamic” whatever point estimate we happen to have sitting around in the patterns of humanity around the arbitrary year 2011. This is the difference between CEV and CFAI, though if either were done in the spirit of their philosophy they might both have enough meta-power to converge to “morality” if such an attractor exists. Vladimir Nesov would perhaps call this yet another kind of values deathism?

ETA2: Or perhaps a better way to characterize what I see as the philosophical difference between CEV and CFAI is this. CEV starts with and focuses on human minds and their local properties at whatever moment the FAI button gets pressed, because, well, you know what you value; [rhetorically:] what else is there? (Though compare/contrast “so you know what you believe; what else is there?”.) The earlier perspective of CFAI on the other hand focuses on human meta-moral intuitions and their potential for invalidity, with humanity-right-now being the result of a suboptimal both-explanatory-and-normative updating processes which “should” be immediately reflected upon and validated or “improved”—if it weren’t for that whole ‘first valid cause’ problem… (Mmmmmmeta. Meta fight!) It is unclear to what extent these differences are matters of medium, emphasis, style, a change of memetic propagation strategy, or an important philosophical shift in Eliezer’s thinking about FAI—perhaps as a result of his his potentially-somewhat-misguided-according-to-me Optimization Enlightenment (an immediate result or cause of his Bayesian Enlightenment if I’m correctly filling in the gaps).