Vivek Hebbar comments on AI will change the world, but won’t take it over by playing “3-dimensional chess”.

Vivek Hebbar 23 Nov 2022 5:44 UTC
LW: 8 AF: 7
2
AF
If the system is modular, such that the part of the system representing the goal is separate from the part of the system optimizing the goal, then it seems plausible that we can apply some sort of regularization to the goal to discourage it from being long term.
What kind of regularization could this be? And are you imagining an AlphaZero-style system with a hardcoded value head, or an organically learned modularity?