Obvious AI connection: goal encapsulation between humans relies on commonalities, such as mental frameworks and terminal goals. These commonalities probably won’t hold for AI: unless it’s an emulation, it will think very differently from humans, and relying on terminal agreements doesn’t work to ground terminal agreement in the first place. Therefore, we should expect it to be very hard to encapsulate goals to an AI.
(Tool AI and Agent AI approaches suffer differently from this difficulty. Agents will be hard to terminally align, but once we’ve done that, we can rely on terminal agreement to flesh out plans. Tools can’t use recursive trust like that, so they’ll need to explicitly understand more instrumental goals.)
Obvious AI connection: goal encapsulation between humans relies on commonalities, such as mental frameworks and terminal goals. These commonalities probably won’t hold for AI: unless it’s an emulation, it will think very differently from humans, and relying on terminal agreements doesn’t work to ground terminal agreement in the first place. Therefore, we should expect it to be very hard to encapsulate goals to an AI.
(Tool AI and Agent AI approaches suffer differently from this difficulty. Agents will be hard to terminally align, but once we’ve done that, we can rely on terminal agreement to flesh out plans. Tools can’t use recursive trust like that, so they’ll need to explicitly understand more instrumental goals.)