johnswentworth comments on Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate

johnswentworth 22 Jun 2020 20:28 UTC
LW: 12 AF: 5
AF
… we don’t merely want a precise theory that lets us build an agent; we want our theory to act like a box that takes in an arbitrary agent (such as one built using ML and other black boxes) and allows us to analyze its behavior.
FWIW, this is what I consider myself to be mainly working towards, and I do expect that the problem is directly solvable. I don’t think that’s a necessary case to make in order for HRAD-style research to be far and away the highest priority for AI safety (so it’s not necessarily a crux), but I do think it’s both sufficient and true.
What links here?
- Comparing Four Approaches to Inner Alignment by Lucas Teixeira (29 Jul 2022 21:06 UTC; 38 points)