In first part of the grandparent, I attempted to strengthen the method. I don’t think automatic checking of a generated AI is anywhere feasible, and so there is little point in discussing that. But a textbook is clearly feasible as means of communicating a solution to a philosophical problem.. And then there are all those obvious old-hat failure modes resulting from the textbook, if only it could be constructed without possibility of causing psychological damage. Maybe we can even improve on that, by having several “generations” of researchers write a textbook for the next one, so that any brittle manipulation patterns contained in the original break.
I’m sorry, I misunderstood your intention completely, probably because of the italics :)
I personally am paranoid enough about ambivalent transhumans that I would be very afraid of giving them such a powerful channel to the outside world, even if they had to pass through many levels of iterative improvement (if I could make a textbook that hijacked your mind, then I could just have you write a similarly destructive textbook to the next researcher, etc.).
I think it is possible to exploit an unfriendly boxed AI to bootstrap to friendliness in a theoretically safe way. Minimally, the problem of doing it safely is very interesting and difficult, and I think I have a solid first step to a solution.
In first part of the grandparent, I attempted to strengthen the method. I don’t think automatic checking of a generated AI is anywhere feasible, and so there is little point in discussing that. But a textbook is clearly feasible as means of communicating a solution to a philosophical problem.. And then there are all those obvious old-hat failure modes resulting from the textbook, if only it could be constructed without possibility of causing psychological damage. Maybe we can even improve on that, by having several “generations” of researchers write a textbook for the next one, so that any brittle manipulation patterns contained in the original break.
I’m sorry, I misunderstood your intention completely, probably because of the italics :)
I personally am paranoid enough about ambivalent transhumans that I would be very afraid of giving them such a powerful channel to the outside world, even if they had to pass through many levels of iterative improvement (if I could make a textbook that hijacked your mind, then I could just have you write a similarly destructive textbook to the next researcher, etc.).
I think it is possible to exploit an unfriendly boxed AI to bootstrap to friendliness in a theoretically safe way. Minimally, the problem of doing it safely is very interesting and difficult, and I think I have a solid first step to a solution.