While this particular alignment case for humans does seem reasonably reliable, it all depends on humans not being proficient at self-improvement/modification yet. For an AGI with self-improvement capability this goes out of the window fast
Yes to some extent. Humans are definitely not completely robust to RSI / at a reflectively stable equilibrium. I do suspect though that sexual desire is at least partially reflectively stable. If people could arbitrarily rewrite their psychology I doubt that most would completely remove their sex drive or transmute it into some completely alien type of desire (some definitely would and I also think there’d be a fair bit of experimentation around the margin as well as removing/tweaking some things due to social desirability biases).
The main point though is that this provides an existence proof that this degree of robust-ish alignment is possible by evolution, which has a lot less advantages we do. We can probably do at least as well for our first proto-AGIs we build before RSI sets in. The key will then be to either carefully manage or prevent RSI or to build more robust drives that are much more reflectively stable than the human sex drive.
While this particular alignment case for humans does seem reasonably reliable, it all depends on humans not being proficient at self-improvement/modification yet. For an AGI with self-improvement capability this goes out of the window fast
Why do we expect quadrillion parameter models to be proficient at self improvement/self modification?
I don’t think the kind of self improvement Yudkowsky imagined would be a significant factor for AGIs trained in the deep learning paradigm.
Yes to some extent. Humans are definitely not completely robust to RSI / at a reflectively stable equilibrium. I do suspect though that sexual desire is at least partially reflectively stable. If people could arbitrarily rewrite their psychology I doubt that most would completely remove their sex drive or transmute it into some completely alien type of desire (some definitely would and I also think there’d be a fair bit of experimentation around the margin as well as removing/tweaking some things due to social desirability biases).
The main point though is that this provides an existence proof that this degree of robust-ish alignment is possible by evolution, which has a lot less advantages we do. We can probably do at least as well for our first proto-AGIs we build before RSI sets in. The key will then be to either carefully manage or prevent RSI or to build more robust drives that are much more reflectively stable than the human sex drive.