Most of the proposals I’ve heard do actually involve getting AI to think in terms of words as its primary internal data structure. But that’s not actually a crux for me. The more important part is this:
Your concerns are correct but go way too far in implying an AI could not be DESIGNED to produce such a stream-of-thought which would have >0 value in managing some smarter-than-human AIs.
>0 value, taken in isolation, is simply not a worthwhile goal to pursue in alignment research. Tons of things provide >0 value in isolation, but do not at all address any of the core subproblems or generalize beyond a specific architecture, and therefore will not cumulatively stack with other work and probably will not even apply to whatever architecture actually ends up being key to use. Epsilons don’t matter unless they stack.
Most of the proposals I’ve heard do actually involve getting AI to think in terms of words as its primary internal data structure. But that’s not actually a crux for me. The more important part is this:
>0 value, taken in isolation, is simply not a worthwhile goal to pursue in alignment research. Tons of things provide >0 value in isolation, but do not at all address any of the core subproblems or generalize beyond a specific architecture, and therefore will not cumulatively stack with other work and probably will not even apply to whatever architecture actually ends up being key to use. Epsilons don’t matter unless they stack.
Ye fair