I would say that the current model of value-learning is already safe for this.
I found a “cake-or-death” problem with the initial formulation (http://lesswrong.com/lw/f3v/cake_or_death/). If such problems can be found with a formulation that looked pretty solid initially, then I’m certainly not confident we can say the current model is safe...
Safe enough to do mathematics on, surely. I wouldn’t declare anything safe to build unless someone hands me a hard hat and a one-time portal to a parallel universe.
I found a “cake-or-death” problem with the initial formulation (http://lesswrong.com/lw/f3v/cake_or_death/). If such problems can be found with a formulation that looked pretty solid initially, then I’m certainly not confident we can say the current model is safe...
Safe enough to do mathematics on, surely. I wouldn’t declare anything safe to build unless someone hands me a hard hat and a one-time portal to a parallel universe.
You are wise, my child ;-)