The point is that as it gets smarter, it gets further along the causal reward line and eliminates and alters a lot of hardware, obtaining eternal-equivalent reward in finite time (and being utility-indifferent between eternal reward hardware running for 1 second and for 10 billion years). Keep in mind that the the total reward is defined purely as result of operations on the clock counter and reward signal (provided sufficient understanding of the reward’s causal chain). Having to sit and wait for the clocks to tick to max out reward is a dumb solution. Rewards in software in general aren’t “pleasure”.
The point is that as it gets smarter, it gets further along the causal reward line and eliminates and alters a lot of hardware, obtaining eternal-equivalent reward in finite time (and being utility-indifferent between eternal reward hardware running for 1 second and for 10 billion years). Keep in mind that the the total reward is defined purely as result of operations on the clock counter and reward signal (provided sufficient understanding of the reward’s causal chain). Having to sit and wait for the clocks to tick to max out reward is a dumb solution. Rewards in software in general aren’t “pleasure”.