Sounds interesting, though the proposal appears to be dependent on solving sub goals that are akin to the halting problem. And we know solving the halting problem is likely not possible.
For example, as you correctly note humans are not even self-aligned for any appreciable span of time, nor space, given the ever threatening presence of cancerous cells within any human body.
How could an AI know when to stop aligning itself with a human whose self alignment slowly changes into a deleterious direction? (for all possible permutations of environment, behaviour, etc…)
Surely not the moment the slightest contradiction arose, otherwise the AI agent would never go beyond a trivial alignment, but also surely never as humans could very well change capriciously into self destructive directions.
It does not appear there could ever exist a general solution for this one sub goal because that would also require a complete solution to the halting problem itself.
Sounds interesting, though the proposal appears to be dependent on solving sub goals that are akin to the halting problem. And we know solving the halting problem is likely not possible.
For example, as you correctly note humans are not even self-aligned for any appreciable span of time, nor space, given the ever threatening presence of cancerous cells within any human body.
How could an AI know when to stop aligning itself with a human whose self alignment slowly changes into a deleterious direction? (for all possible permutations of environment, behaviour, etc…)
Surely not the moment the slightest contradiction arose, otherwise the AI agent would never go beyond a trivial alignment, but also surely never as humans could very well change capriciously into self destructive directions.
It does not appear there could ever exist a general solution for this one sub goal because that would also require a complete solution to the halting problem itself.