DanielFilan comments on [missing post]

DanielFilan 12 Jul 2024 22:31 UTC
2 points
0
those that rely on arbitrary AGIs detecting and [settling on as natural] the same features of the world that humans do, including values and qualities important to humanity
can you give examples of such strategies, and argue that they rely on this?
- Lorxus 13 Jul 2024 0:10 UTC
  1 point
  0
  Parent
  I’m in a weird situation here: I’m not entirely sure whether the community considers the Learning Theory Agenda to be the same alignment plan as The Plan (which is arguably not a plan at all but he sure thinks about value learning!), and whether I can count things like the class of scalable oversight plans which take as read that “human values” are a specific natural object. Would you at least agree that those first two (or one???) rely on that?