Wei Dai comments on The Obliqueness Thesis

Wei Dai 19 Sep 2024 15:52 UTC
LW: 14 AF: 9
4
AF
As long as all mature superintelligences in our universe don’t necessarily have (end up with) the same values, and only some such values can be identified with our values or what our values should be, AI alignment seems as important as ever. You mention “complications” from obliqueness, but haven’t people like Eliezer recognized similar complications pretty early, with ideas such as CEV?

It seems to me that from a practical perspective, as far as what we should do, your view is much closer to Eliezer’s view than to Land’s view (which implies that alignment doesn’t matter and we should just push to increase capabilities/intelligence). Do you agree/disagree with this?

It occurs to me that maybe you mean something like “Our current (non-extrapolated) values are our real values, and maybe it’s impossible to build or become a superintelligence that shares our real values so we’ll have to choose between alignment and superintelligence.” Is this close to your position?
- jessicata 19 Sep 2024 16:56 UTC
  LW: 12 AF: 5
  0
  AF Parent
  “as important as ever”: no, because our potential influence is lower, and the influence isn’t on things shaped like our values, there has to be a translation, and the translation is different from the original.
  
  CEV: while it addresses “extrapolation” it seems broadly based on assuming the extrapolation is ontologically easy, and “our CEV” is an unproblematic object we can talk about (even though it’s not mathematically formalized, any formalization would be subject to doubt, and even if formalized, we need logical uncertainty over it, and logical induction has additional free parameters in the limit). I’m really trying to respond to orthogonality not CEV though.
  
  from a practical perspective: notice that I am not behaving like Eliezer Yudkowsky. I am not saying the Orthogonality Thesis is true and important to ASI, I am instead saying intelligence/values are Oblique and probably nearly Diagonal (though it’s unclear what I mean by “nearly”). I am not saying a project of aligning superintelligence with human values is a priority. I am not taking research approaches that assume a Diagonal/Orthogonal factorization. I left MIRI partially because I didn’t like their security policies (and because I had longer AI timelines), I thought discussion of abstract research ideas was more important. I am not calling for a global AI shutdown so this project (which is in my view confused) can be completed. I am actually against AI regulation on the margin (I don’t have a full argument for this, it’s a political matter at this point).
  
  I think practicality looks more like having near-term preferences related to modest intelligence increases (as with current humans vs humans with neural nets; how do neural nets benefit or harm you, practically? how can you use them to think better and improve your life?), and not expecting your preferences to extend into the distant future with many ontology changes, so don’t worry about grabbing hold of the whole future etc, think about how to reduce value drift while accepting intelligence increases on the margin. This is a bit like CEV except CEV is in a thought experiment instead of reality.
  
  The “Models of ASI should start with realism” bit IS about practicalities, namely, I think focusing on first forecasting absent a strategy of what to do about the future is practical with respect to any possible influence on the far future; practically, I think your attempted jump to practicality (which might be related to philosophical pragmatism) is impractical in this context.
  
  It occurs to me that maybe you mean something like “Our current (non-extrapolated) values are our real values, and maybe it’s impossible to build or become a superintelligence that shares our real values so we’ll have to choose between alignment and superintelligence.” Is this close to your position?
  
  Close. Alignment of already-existing human values with superintelligence is impossible (I think) because of the arguments given. That doesn’t mean humans have no preferences indirectly relating to superintelligence (especially, we have preferences about modest intelligence increases, and there’s some iterative process).
  - Wei Dai 19 Sep 2024 17:55 UTC
    LW: 7 AF: 5
    −3
    AF Parent
    What do you think about my positions on these topics as laid out in and Six Plausible Meta-Ethical Alternatives and Ontological Crisis in Humans?
    
    My overall position can be summarized as being uncertain about a lot of things, and wanting (some legitimate/trustworthy group, i.e., not myself as I don’t trust myself with that much power) to “grab hold of the whole future” in order to preserve option value, in case grabbing hold of the whole future turns out to be important. (Or some other way of preserving option value, such as preserving the status quo / doing AI pause.) I have trouble seeing how anyone can justifiably conclude “so don’t worry about grabbing hold of the whole future” as that requires confidently ruling out various philosophical positions as false, which I don’t know how to do. Have you reflected a bunch and really think you’re justified in concluding this?
    
    E.g. in Ontological Crisis in Humans I wrote “Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another?” which would contradict your “not expecting your preferences to extend into the distant future with many ontology changes” and I don’t know how to rule this out. You wrote in the OP “Current solutions, such as those discussed in MIRI’s Ontological Crises paper, are unsatisfying. Having looked at this problem for a while, I’m not convinced there is a satisfactory solution within the constraints presented.” but to me this seems like very weak evidence for the problem being actually unsolvable.
    - jessicata 19 Sep 2024 23:11 UTC
      LW: 6 AF: 2
      0
      AF Parent
      re meta ethical alternatives:
      
      roughly my view
      slight change, opens the question of why the deviations? are the “right things to value” not efficient to value in a competitive setting? mostly I’m trying to talk about those things to value that go along with intelligence, so it wouldn’t correspond with a competitive disadvantage in general. so it’s still close enough to my view
      roughly Yudkowskian view, main view under which the FAI project even makes sense. I think one can ask basic questions like which changes move towards more rationality on the margin, though such changes would tend to prioritize rationality over preventing value drift. I’m not sure how much there are general facts about how to avoid value drift (it seems like the relevant kind, i.e. value drift as part of becoming more rational/intelligent, only exists from irrational perspectives, in a way dependent on the mind architecture)
      minimal CEV-realist view. it really seems up to agents how much they care about their reflected preferences. maybe changing preferences too often leads to money pumps, or something?
      basically says “there are irrational and rational agents, rationality doesn’t apply to irrational agents”, seems somewhat how people treat animals (we don’t generally consider uplifting normative with respect to animals)
      at this point you’re at something like ecology / evolutionary game theory, it’s a matter of which things tend to survive/reproduce and there aren’t general decision theories that succeed
      
      re human ontological crises: basically agree, I think it’s reasonably similar to what I wrote. roughly my reason for thinking that it’s hard to solve is that the ideal case would be something like a universal algebra homomorphism (where the new ontology actually agrees with the old one but is more detailed), yet historical cases like physics aren’t homomorphic to previous ontologies in this way, so there is some warping necessary. you could try putting a metric on the warping and minimizing it, but, well, why would someone think the metric is any good, it seems more of a preference than a thing rationality applies to. if you think about it and come up with a solution, let me know, of course.
      
      with respect to grabbing hold of the whole future: you can try looking at historical cases of people trying to grab hold of the future and seeing how that went, it’s a mixed bag with mostly negative reputation, indicating there are downsides as well as upsides, it’s not a “safe” conservative view. see also Against Responsibility. I feel like there’s a risk of getting Pascal’s mugged about “maybe grabbing hold of the future is good, you can’t rule it out, so do it”, there are downsides to spending effort that way. like, suppose some Communists thought capitalism would lead to the destruction of human value with high enough probability that instituting global communism is the conservative option, it doesn’t seem like that worked well (even though a lot of people around here would agree that capitalism tends to leads to human value destruction in the long run). particular opportunities for grabbing hold of the future can be net negative and not worth worrying about even if one of them is a good idea in the long run (I’m not ruling that out, just would have to be convinced of specific opportunities).
      
      overall I’d rather focus on first modeling the likely future and looking for plausible degrees of freedom; a general issue with Pascal’s mugging is it might make people overly attached to world models in which they have ~infinite impact (e.g. Christianity, Communism) which means paying too much attention to wrong world models, not updating to more plausible models in which existential-stakes decisions could be comprehended if they exist. and Obliqueness doesn’t rule out existential stakes (since it’s non-Diagonal).
      
      as another point, Popperian science tends to advance by people making falsifiable claims, “you don’t know if that’s true” isn’t really an objection in that context. the pragmatic claim I would make is: I have some Bayesian reason to believe agents do not in general factor into separate Orthogonal and Diagonal components, this claim is somewhat falsifiable (someone could figure out a theory of this invulnerable to optimization daemons etc), I’m going to spend my attention on the branch where I’m right, I’m not going to worry about Pascal’s mugging type considerations for if I’m wrong (as I said, modeling the world first seems like a good general heuristic), people can falsify it eventually if it’s false.
      
      this whole discussion is not really a defense of Orthogonality given that Yudkowsky presented orthogonality as a descriptive world model, not a normative claim, so sticking to the descriptive level in the original post seems valid; it would be a form of bad epistemology to reject a descriptive update (assuming the arguments are any good) because of pragmatic considerations.
      - habryka 19 Sep 2024 23:17 UTC
        LW: 4 AF: 4
        0
        AF Parent
        with respect to grabbing hold of the whole future: you can try looking at historical cases of people trying to grab hold of the future and seeing how that went, it’s a mixed bag with mostly negative reputation, indicating there are downsides as well as upsides, it’s not a “safe” conservative view. see also Against Responsibility. I feel like there’s a risk of getting Pascal’s mugged about “maybe grabbing hold of the future is good, you can’t rule it out, so do it”, there are downsides to spending effort that way.
        I agree with a track-record argument of this, but I think the track record of people trying to broadly ensure that humanity continues to be in control of the future (while explicitly not optimizing for putting themselves personally in charge) seems pretty good to me.
        Generally a lot of industrialist and human-empowerment stuff has seemed pretty good to me on track record, and I really feel like all the bad parts of this are screened off by the “try to put yourself and/or your friends in charge” component.
        gallabytes 20 Sep 2024 2:00 UTC
        12 points
        4
        Parent
        
        the track record of people trying to broadly ensure that humanity continues to be in control of the future
        
        What track record?
        jessicata 19 Sep 2024 23:26 UTC
        LW: 11 AF: 4
        3
        AF Parent
        hmm, I wouldn’t think of industrialism and human empowerment as trying to grab the whole future, just part of it, in line with the relatively short term (human not cosmic timescale) needs of the self and extended community; industrialism seems to lead to capitalist organization which leads to decentralization superseding nations and such (as Land argues).
        
        I think communism isn’t generally about having one and one’s friends in charge, it is about having human laborers in charge. One could argue that it tended towards nationalism (e.g. USSR), but I’m not convinced that global communism (Trotskyism) would have worked out well either. Also, one could take an update from communism about agendas for global human control leading to national control (see also tendency of AI safety to be taken over by AI national security as with the Situational Awareness paper). (Again, not ruling out that grabbing hold of the entire future could be a good idea at some point, just not sold on current agendas and wanted to note there are downsides that push against Pascal’s mugging type considerations)