Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer.
Why do you even go around thinking that the concept of “terminal values”, which is basically just a consequentialist steelmanning Aristotle, cuts reality at the joints?
For starters, you want to be able to prove formally that its goals will remain stable as it self-modifies
That part honestly isn’t that hard once you read the available literature about paradox theorems.
Why do you even go around thinking that the concept of “terminal values”, which is basically just a consequentialist steelmanning Aristotle, cuts reality at the joints?
That part honestly isn’t that hard once you read the available literature about paradox theorems.