red75prime comments on AI Will Not Want to Self-Improve

red75prime 17 May 2023 10:04 UTC
3 points
2
the simpler the utility function the easier time it has guaranteeing the alignment of the improved version
If we are talking about a theoretical $a r g m a x_{a} E (U | a)$ AI, where $E (U | a)$ (expectation of utility given the action a) somehow points to the external world, then sure. If we are talking about a real AI with aspiration to become the physical embodiment of the aforementioned theoretical concept (with the said aspiration somehow encoded outside of $U$ , because $U$ is simple), then things get more hairy.