dr_s comments on If Alignment is Hard, then so is Self-Improvement

dr_s 7 Apr 2023 11:17 UTC
5 points
2

no capable agent would willingly create a more powerful agent that might not have the same goals as itself

Or the AI might be as much of an overconfident dumbass as us, and make a mistake. Even superintelligence doesn’t mean perfection, and the problem would grow progressively harder as the AI scales up. In fact, I would say even aligned AI is potentially a ticking time bomb if its alignment solution isn’t perfectly scalable.