AnnaSalamon comments on Consequentialism & corrigibility

AnnaSalamon 30 Dec 2024 23:24 UTC
4 points
0
or more centrally, long after I finish the course of action.
I don’t understand why the more central thing is “long after I finish the course of action” as opposed to “in ways that are clearly ‘external to’ the process called ‘me’, that I used to take the actions.”
- Steven Byrnes 31 Dec 2024 1:28 UTC
  4 points
  0
  Parent
  Hmm, yeah that too. What I had in mind was the idea that “consequentialist” usually has a connotation of “long-term consequentialist”, e.g. taking multiple actions over time that consistently lead to something happening.
  For example:
  - Instrumental convergence doesn’t bite very hard if your goals are 15 seconds in the future.
  - If an AI acts to maximize long-term paperclips at 4:30pm, and to minimize long-term paperclips at 4:31pm, and to maximize them at 4:32pm, etc., and to minimize them at 4:33pm, etc., then we wouldn’t intuitively think of that AI as a consequentialist rational agent, even if it is technically a consequentialist rational agent at each moment.