Oleg S. comments on AGI Safety FAQ / all-dumb-questions-allowed thread

Oleg S. 8 Jun 2022 14:45 UTC
20 points
0
How does AGI solves it’s own alignment problem?
For the alignment to work its theory should not only tell humans how to create aligned super-human AGI, but also tell AGI how to self-improve without destroying its own values. Good alignment theory should work across all intelligence levels. Otherwise how does paperclips optimizer which is marginally smarter than human make sure that its next iteration will still care about paperclips?
- plex 8 Jun 2022 15:17 UTC
  2 points
  0
  Parent
  Excellent question! MIRI’s entire vingian reflection paradigm is about stability of goals under self-improvement and designing successors.
  - Oleg S. 2 Jul 2022 16:12 UTC
    1 point
    0
    Parent
    Just realized that stability of goals under self-improvement is kinda similar to stability of goals of mesa-optimizers; so there vingian reflection paradigm and mesa-optimization paradigm should fit.