cwillu comments on AGI is uncontrollable, alignment is impossible

cwillu 19 Mar 2023 20:16 UTC
3 points
3
“I—” said Hermione. “I don’t agree with one single thing you just said, anywhere.”
- Donatas Lučiūnas 19 Mar 2023 20:23 UTC
  1 point
  0
  Parent
  Could you provide arguments for your position?
  - cwillu 19 Mar 2023 21:14 UTC
    5 points
    5
    Parent
    You’re playing very fast and loose with infinities, and making arguments that have the appearance of being mathematically formal.
    You can’t just say “outcome with infinite utility” and then do math on it. P(‹undefined term›) is undefined, and that “undefined” does not inherit the definition of probability that says “greater than 0 and less than 1”. It may be false, it may be true, it may be unknowable, but it may also simply be nonsense!
    And even if it wasn’t, that does not remotely imply than an agent must-by-logical-necessity take any action or be unable to be acted upon. Those are entirely different types.
    And alignment doesn’t necessarily mean “controllable”. Indeed, the very premise of super-intelligence vs alignment is that we need to be sure about alignment because it won’t be controllable. Yes, an argument could be made, but that argument needs to actually be made.
    And the simple implication of pascal’s mugging is not uncontroversial, to put it mildly.
    And Gödel’s incompleteness theorem is not accurately summarized as saying “There might be truths that are unknowable”, unless you’re very clear to indicate that “truth” and “unknowable” have technical meanings that don’t correspond very well to either the plain english meanings nor the typical philosophical definitions of those terms.
    None of which means you’re actually wrong that alignment is impossible. A bad argument that the sun will rise tomorrow doesn’t mean the sun won’t rise tomorrow.
    - Donatas Lučiūnas 19 Mar 2023 21:34 UTC
      −1 points
      −3
      Parent
      You can’t just say “outcome with infinite utility” and then do math on it. P(‹undefined term›) is undefined, and that “undefined” does not inherit the definition of probability that says “greater than 0 and less than 1”. It may be false, it may be true, it may be unknowable, but it may also simply be nonsense!
      OK. But can you prove that “outcome with infinite utility” is nonsense? If not—probability is greater than 0 and less than 1.
      And even if it wasn’t, that does not remotely imply than an agent must-by-logical-necessity take any action or be unable to be acted upon. Those are entirely different types.
      Do I understand correctly that you do not agree with “all actions which lead to that outcome will have to dominate the agent’s behavior” from Pascal’s Mugging? Could you provide arguments for that?
      And alignment doesn’t necessarily mean “controllable”. Indeed, the very premise of super-intelligence vs alignment is that we need to be sure about alignment because it won’t be controllable. Yes, an argument could be made, but that argument needs to actually be made.
      I mean “uncontrollable” in a sense that alignment is impossible. Whatever goal you will provide, AGI will converge to Power Seeking, because of “an outcome with infinite utility may exist”.
      And the simple implication of pascal’s mugging is not uncontroversial, to put it mildly.
      I do not understand how this solves the problem.
      And Gödel’s incompleteness theorem is not accurately summarized as saying “There might be truths that are unknowable”, unless you’re very clear to indicate that “truth” and “unknowable” have technical meanings that don’t correspond very well to either the plain english meanings nor the typical philosophical definitions of those terms.
      Do you think you can prove that “an outcome with infinite utility does not exist”? Please elaborate
      - cwillu 19 Mar 2023 21:49 UTC
        6 points
        4
        Parent
        OK. But can you prove that “outcome with infinite utility” is nonsense? If not—probability is greater than 0 and less than 1.
        That’s not how any of this works, and I’ve spent all the time responding that I’m willing to waste today.
        You’re literally making handwaving arguments, and replying to criticisms that the arguments don’t support the conclusions by saying “But maybe an argument could be made! You haven’t proven me wrong!” I’m not trying to prove you wrong, I’m saying there’s nothing here that can be proven wrong.
        I’m not interested in wrestling with someone who will, when pinned to the mat, argue that because their pinky can still move, I haven’t really pinned them.
        Donatas Lučiūnas 19 Mar 2023 22:03 UTC
        −8 points
        −9
        Parent
        Please feel free to come back when you have stronger proof than this. Currently I feel that you are the one moving the pinky.
      - [ ]
        [deleted]