Wei Dai comments on Dreams of AI alignment: The danger of suggestive names

Wei Dai 27 Feb 2024 21:37 UTC
8 points
2
I forgot to mention one more argument, namely that something like a goal-directed optimizer is my best guess of what a philosophically and technologically mature, reflectively stable general intelligence will look like, since it’s the only motivational structure we know that looks anywhere close to reflective stability.

I want to be careful not to overstate how close, or to rule out the possibility of discovering some completely different reflectively stable motivational structure in the future, but in our current epistemic state, reflective stability by itself already seems enough to motivate the theoretical study of goal-directed optimizers.