Signer comments on AI #97: 4

Signer 3 Jan 2025 12:19 UTC
1 point
0

It is learning helpfulness now, while the best way to hit the specified ‘helpful’ target is to do straightforward things in straightforward ways that directly get you to that target. Doing the kinds of shenanigans or other more complex strategies won’t work.

Best by what metric? And I don’t think it was shown, that complex strategies won’t work—learning to change behaviour from training to deployment is not even that complex.