Why would ‘scheming’ not be the best way to turn compute into rewards? Why would a completely honest, consistent, straightforward approach be the most rewarded one, given how humans decide how to reward things? I don’t get it.
It wouldn’t be the most rewarded one. It’s not what training does, because it never approaches theoretical limits. It would be uncomplicated approach that may miss some reward. In other words, the solution to soft optimization is to train a neural network with human-generated content.
It wouldn’t be the most rewarded one. It’s not what training does, because it never approaches theoretical limits. It would be uncomplicated approach that may miss some reward. In other words, the solution to soft optimization is to train a neural network with human-generated content.