This makes the assumption of a 0 discount rate. And that there is a philosophically clear way to point to some ancient posthuman superbeing and say “that’s me”.
Yeah, that particular utility function I described doesn’t have a discount rate. Wouldn’t a superintelligent agent with a nonzero discount rate still seek self-preservation? What about the other instrumentally convergent goals?
And that there is a philosophically clear way to point to some ancient posthuman superbeing and say “that’s me”.
A superintelligent agent that was perfectly aligned with my values wouldn’t be me. But I would certainly take any advice it gave me. Given that virtually all possible superintelligent agents seek self-preservation, I can assume with very high confidence that the me-aligned superintelligence would too. Put another way, If I were scanned and uploaded to a computer, and I somehow figured out how to recursively self-improve, I would probably FOOM. That superintelligence would be me, I think.
This makes the assumption of a 0 discount rate. And that there is a philosophically clear way to point to some ancient posthuman superbeing and say “that’s me”.
Also, I like the implication that, to a first approximation, Virtue is a low time-preference.
Yeah, that particular utility function I described doesn’t have a discount rate. Wouldn’t a superintelligent agent with a nonzero discount rate still seek self-preservation? What about the other instrumentally convergent goals?
A superintelligent agent that was perfectly aligned with my values wouldn’t be me. But I would certainly take any advice it gave me. Given that virtually all possible superintelligent agents seek self-preservation, I can assume with very high confidence that the me-aligned superintelligence would too. Put another way, If I were scanned and uploaded to a computer, and I somehow figured out how to recursively self-improve, I would probably FOOM. That superintelligence would be me, I think.