Seth Herd comments on The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Seth Herd 25 Feb 2024 20:07 UTC
6 points
0
Yes, that’s a hard part. But specifying the goal accurately is often regarded as a potential failure point. So, if I’m right that this is a simpler, easier-to-specify alignment goal, that’s progress. It also has the advantage of incorporating corrigibility as a by product; so it’s resistant to partial failure—if you can tell that something went wrong in time, the AGI can be asked to shut down.

WRT to the difficulty of using the AGI’s understanding as its terminal goal, I think it’s not trivial, but quite do-able, at least in some of the AGI architecture we can anticipate. See my two short posts Goals selected from learned knowledge: an alternative to RL alignment and The (partial) fallacy of dumb superintelligence.
- EJT 26 Feb 2024 9:13 UTC
  1 point
  0
  Parent
  Thanks, I’ll check those out.