cousin_it comments on What’s Hard About The Shutdown Problem

cousin_it 21 Oct 2023 16:48 UTC
15 points
2
I’m no longer sure the problem makes sense. Imagine an AI whose on-goal is to make money for you, and whose off-goal is to do nothing in particular. Imagine you turn it on, and it influences the government to pay a monthly stipend to people running money-making AIs, including you. By that action, is the AI making money for you in a legitimate way? Or is it bribing you to keep it running and avoid pressing the shutdown button? How do you even answer a question like that?
- Daniel Kokotajlo 21 Oct 2023 18:40 UTC
  4 points
  1
  Parent
  If we had great mechinterp, I’d answer the question by looking into the mind of the AI and seeing whether or not it considered the “this will reduce the probability of the shutdown button being pressed” possibility in its reasoning (or some similar thing), and if so, whether it considered it a pro, a con, or a neutral side-effect.
  - cousin_it 22 Oct 2023 19:13 UTC
    5 points
    1
    Parent
    Then it seems to me that judging the agent’s purity of intentions is also a deep problem. At least for humans it is. For example, a revolutionary may only want to overthrow the unjust hierarchy, but then succeed and end up in power. So they didn’t consciously try to gain power, but maybe evolution gave them some behaviors that happen to gain power, without the agent explicitly encoding “the desire for power” at any level.
    - Daniel Kokotajlo 23 Oct 2023 2:44 UTC
      1 point
      0
      Parent
      I think this is not so big of a problem, if we have the assumed level of mechinterp.
  - martinkunev 28 Aug 2024 16:50 UTC
    3 points
    0
    Parent
    this assumes concepts like “shutdown button” are in the ontology of the AI. I’m not sure how much we understand about what ontology AIs likely end up with
  - Dweomite 21 Oct 2023 19:32 UTC
    3 points
    0
    Parent
    How would those questions apply to the “trammeling” example from part 2 of the post? Where the AI is keeping the overall probability of outcome B the same, but intentionally changing which worlds get outcome B in order to indirectly trade A1 outcomes for A2 outcomes.
    - Daniel Kokotajlo 22 Oct 2023 1:16 UTC
      4 points
      0
      Parent
      Good point. I revise it to “if so, whether it considered it a pro, a con, or an important thing to trammell, or none of the above.”
      
      Come to think of it, why is trammelling so bad? If it keeps the probability of button being pressed the same, why do we care exactly? Is it because our ability to influence the button might be diminishing?
      - Dweomite 22 Oct 2023 1:36 UTC
        5 points
        4
        Parent
        That’s my understanding of why it’s bad, yes. The point of the button is that we want to be able to choose whether it gets pressed or not. If the AI presses it in a bunch of world where we don’t want it pressed and stops it from being pressed in a bunch of worlds where we do want it pressed, those are both bad. The fact that the AI is trading an equal probability mass in both directions doesn’t make it any less bad from our perspective.