MinusGix comments on A simple case for extreme inner misalignment

MinusGix 14 Jul 2024 13:31 UTC
1 point
0
I’m skeptical of the naming being bad, it fits with that definition and the common understanding of the word. The Orthogonality Thesis is saying that the two qualities of goal/value are not necessarily related, which may seem trivial nowadays but there used to be plenty of people going “if the AI becomes smart, even if it is weird, it will be moral towards humans!” through reasoning of the form “smart → not dumb goals like paperclips”. There’s structure imposed on what minds actually get created, based on what architectures, what humans train the AI on, etc. Just as two vectors can be orthogonal in R^2 while the actual points you plot in the space are correlated.
- sunwillrise 14 Jul 2024 13:42 UTC
  1 point
  0
  Parent
  it fits with that definition
  With what definition? The one most applicable here, dealing with random variables (relative to our subjective uncertainty), says “random variables that are independent”. Independence implies uncorrelation, even if the converse doesn’t hold.
  Just as two vectors can be orthogonal in R^2 while the actual points you plot in the space are correlated.
  This is totally false as a matter of math if you use the most common definition of orthogonality in this context. I do agree that what you are saying could be correct if you do not think of orthogonality that way and instead simply look at it in terms of the entries of the vectors, but then you enter the realm of trying to capture the “goals” and “beliefs” as specific Euclidean vectors, and I think that isn’t the best idea for generating intuition because one of the points of the Orthogonality thesis seems to be to instead abstract away from the specific representation you choose for intelligence and values (which can bias you one way or another) and to instead focus on the broad, somewhat-informal conclusion.
  - MinusGix 14 Jul 2024 18:55 UTC
    1 point
    0
    Parent
    
    it fits with that definition
    
    Ah, I rewrote my comment a few times and lost what I was referencing. I originally was referencing the geometric meaning (as an alternate to your statistical definition), two vectors at a right angle from each other.
    
    But the statistical understanding works from what I can tell? You have your initial space with extreme uncertainty, and the orthogonality thesis simply states that (intelligence, goals) are not related — you can pair some intelligence with any goal. They are independent of each other at this most basic level. This is the orthogonality thesis. Then, in practice, you condition your probability distribution over that space with your more specific knowledge about what minds will be created, and how they’ll be created. You can consider this as giving you a new space, moving probability around. As an absurd example: if height/weight of creatures were uncorrelated in principal, but then we update on “this is an athletic human”, then in that new distribution they are correlated! This is what I was trying to get at with my R^2 example, but apologies that I was unclear since I was still coming at it from a frame of normal geometry. (Think, each axis is an independent normal distribution but then you condition on some knowledge that restricts them such that they become correlated)
    
    I agree that it is an informal argument and that pinning it down to very detailed specifics isn’t necessary or helpful at this low-level, I’m merely attempting to explain why orthogonality works. It is a statement about the basic state of minds before we consider details, and they are orthogonal there; because it is an argumentative response to assumptions about “smart → not dumb goals”.