I had a similar idea some months ago, about making the agent spend exponentially more time on proofs that imply higher utility values. Unfortunately such an agent would spend the most time trying to achieve (D,C), because that’s the highest utility outcome. Or maybe I misunderstand your idea...
I had a similar idea some months ago, about making the agent spend exponentially more time on proofs that imply higher utility values. Unfortunately such an agent would spend the most time trying to achieve (D,C), because that’s the highest utility outcome. Or maybe I misunderstand your idea...