Nathan Helm-Burger comments on If Alignment is Hard, then so is Self-Improvement

Nathan Helm-Burger 7 Apr 2023 4:13 UTC
10 points
4
Indeed, if this were guaranteed to be the case for all agents… then we wouldn’t have to worry about humans building unaligned agents more powerful than themselves. We’d realize that was a bad idea and simple not do it. Is that… what you’d like the gamble everything on? Or maybe… agents can do foolish things sometimes.