Tor Økland Barstad comments on AGI is uncontrollable, alignment is impossible

Tor Økland Barstad 22 Mar 2023 1:50 UTC
1 point
0
He didn’t say that “infinite value” is logically impossible. He desdribed it as an assumption.
When saying “is possible, I’m not sure if he meant “is possible (conceptually)” or “is possible (according to the ontology/optimization-criteria of any given agent)”. I think the latter would be most sensible.

He later said: “I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid.”. Not sure if I interpreted him correctly, but I saw it largely as an encouragment to think more explicitly about things like these (not be sloppy about it). Or if not an encouragement to do that, then at least pointing out that it’s something you’re currently not doing.
If we have a traditional/standard utility-function, and use traditional/standard math in regards to that utility function, then involving credences of infinitie-utility outcomes would typically make things “break down” (with most actions considered to have expected utilities that are either infinite or undefined).
Like, suppose action A has 0.001% chance of infinite negative utility and 99% chance of infinite positive utility. The utility of that action would, I think, be undefined (I haven’t looked into it). I can tell for sure that mathemathically it would not be regarded to have positive utility. Here is a video that explains why.

If that doesn’t make intuitive sense to you, then that’s fine. But mathemathically that’s how it is. And that’s something to have awareness of (account for in a non-handwavy way) if you’re trying to make a mathemathical argument with a basis in utility functions that deal with infinities.

Even if you did account for that it would be besides the point from my perspective, in more ways than one. So what we’re discussing now is not actually a crux for me.

Like, suppose action A has 0.001% chance of infinite negative utility and 99% chance of infinite positive utility. The utility of that action would, I think, be undefined
For me personally, it would of course make a big difference whether there is a 0.00000001% chance of infinite positive utility or a 99.999999999% chance. But that is me going with my own intuitions. The standard math relating to EV-calculations doesn’t support this.
- Donatas Lučiūnas 22 Mar 2023 6:44 UTC
  1 point
  0
  Parent
  Do you think you can deny existence of an outcome with infinite utility? The fact that things “break down” is not a valid argument. If you cannot deny—it’s possible. And it it’s possible—alignment impossible.
  - Tor Økland Barstad 22 Mar 2023 7:30 UTC
    1 point
    0
    Parent
    Do you think you can deny existence of an outcome with infinite utility?
    To me, according to my preferences/goals/inclinations, there are conceivable outcomes with infinite utility/disutility.
    But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not “care” about infinite outcomes.
    The fact that things “break down” is not a valid argument.
    I guess that depends on what’s being discussed. Like, it is something to take into account/consideration if you want to prove something while referencing utility-functions that reference infinities.
    - Donatas Lučiūnas 22 Mar 2023 7:34 UTC
      1 point
      0
      Parent
      But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not “care” about infinite outcomes.
      As I understand you do not agree with
      If an outcome with infinite utility is presented, then it doesn’t matter how small its probability is: all actions which lead to that outcome will have to dominate the agent’s behavior.
      from Pascal’s Mugging, not with me. Do you have any arguments for that?
      - Tor Økland Barstad 22 Mar 2023 7:39 UTC
        1 point
        0
        Parent
        I do have arguments for that, and I have already mentioned some of them earlier in our discussion (you may not share that assesment, despite us being relatively close in mind-space compared to most possible minds, but oh well).
        Some of the more relevant comments from me are on one of the posts that you deleted.
        As I mention here, I think I’ll try to round off this discussion. (Edit: I had a malformed/misleading sentence in that comment that should be fixed now.)