I’d argue that a major part of the problem really is long-term consequentialism, but I’d argue that this is inevitable at least partially as soon as 2 conditions are met by default:
Trade offs exist, and the value of something cannot be infinity nor arbitrarily large values.
Full knowledge of the values of something isn’t known to the agent.
It really doesn’t matter whether consequentialism or morality is actually true, just whether it’s more useful than other approaches (given that capabilities researchers are only focusing on how capable a model is.)
And for a lot of problems in the real world, this is pretty likely to occur.
I’d argue that a major part of the problem really is long-term consequentialism, but I’d argue that this is inevitable at least partially as soon as 2 conditions are met by default:
Trade offs exist, and the value of something cannot be infinity nor arbitrarily large values.
Full knowledge of the values of something isn’t known to the agent.
It really doesn’t matter whether consequentialism or morality is actually true, just whether it’s more useful than other approaches (given that capabilities researchers are only focusing on how capable a model is.)
And for a lot of problems in the real world, this is pretty likely to occur.
For a link to a dentological AI idea, here it is:
https://www.lesswrong.com/posts/FSQ4RCJobu9pussjY/ideological-inference-engines-making-deontology
And for a myopic decision theory, LCDT:
https://www.lesswrong.com/posts/Y76durQHrfqwgwM5o/lcdt-a-myopic-decision-theory