Is it possible that making an expected utility maximizer might be less dangerous than making something which isn’t?
Consider as an alternative an expected log utility maximizer (an agent using the Kelly Criterion, or some approximation of it).
The sooner an AI wins, the more galaxies it can consume. The expected utility maximizer weighs those galaxies against the risk of failure, and is willing to take plans with much higher probabilities of failure. Like SBF, it would take bets which have a 50% chance of more-than-doubling its utility and 50% of losing it all. In many environments, this strategy will almost certainly result in failure, as the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.
The log utility maximizer carefully plans and succeeds in most or all futures. That looks like humanity dying with near-certainty.
A hyper-expected utility maximizer (an AI which maximizes expected exp(utility) or similar) would be even safer. Instead of trying to deceive you into letting it out of the box, it asks nicely or does something crazy because if it works, it can work in less time than deception, which means more galaxies.
So if we were to choose between existing in the world of a superintelligent expected log(resources) maximizer, and a superintelligent expected utility maximizer, we should maybe go for the one which results in us being alive in more futures.
Of course, the expected-log-utility agent would also appear the most capable and useful. The hyper-expected utility maximizer would be near-useless.
Men[1] will die[2] for her[3] massive[4] coconuts[5].
All of humanity
Go extinct
Hindsight Experience Replay (HER), a technique for improving the reinforcement learning training signal
Large-scale training and large model size
Chain of Continuous Thought, a technique that makes model chain of thought much less interpretable but which allows the model to reason more efficiently