Chinese Room comments on Human sexuality as an interesting case study of alignment

Chinese Room 31 Dec 2022 2:31 UTC
1 point
0
While this particular alignment case for humans does seem reasonably reliable, it all depends on humans not being proficient at self-improvement/modification yet. For an AGI with self-improvement capability this goes out of the window fast
- DragonGod 6 Jan 2023 17:15 UTC
  2 points
  0
  Parent
  Why do we expect quadrillion parameter models to be proficient at self improvement/self modification?
  
  I don’t think the kind of self improvement Yudkowsky imagined would be a significant factor for AGIs trained in the deep learning paradigm.
- beren 1 Jan 2023 13:51 UTC
  1 point
  0
  Parent
  Yes to some extent. Humans are definitely not completely robust to RSI / at a reflectively stable equilibrium. I do suspect though that sexual desire is at least partially reflectively stable. If people could arbitrarily rewrite their psychology I doubt that most would completely remove their sex drive or transmute it into some completely alien type of desire (some definitely would and I also think there’d be a fair bit of experimentation around the margin as well as removing/tweaking some things due to social desirability biases).
  The main point though is that this provides an existence proof that this degree of robust-ish alignment is possible by evolution, which has a lot less advantages we do. We can probably do at least as well for our first proto-AGIs we build before RSI sets in. The key will then be to either carefully manage or prevent RSI or to build more robust drives that are much more reflectively stable than the human sex drive.