The point of this exercise was to allow the unfriendly AI access not even to the most limited communication it might be able to accomplish using byproducts of computation. Having it produce an unconstrained textbook which a human actually reads is clearly absurd, but I’m not sure what you are trying to mock.
Do you think it is impossible to formulate any question which the AI can answer without exercising undue influence? It seems hard, but I don’t see why you would ridicule the possibility.
In first part of the grandparent, I attempted to strengthen the method. I don’t think automatic checking of a generated AI is anywhere feasible, and so there is little point in discussing that. But a textbook is clearly feasible as means of communicating a solution to a philosophical problem.. And then there are all those obvious old-hat failure modes resulting from the textbook, if only it could be constructed without possibility of causing psychological damage. Maybe we can even improve on that, by having several “generations” of researchers write a textbook for the next one, so that any brittle manipulation patterns contained in the original break.
I’m sorry, I misunderstood your intention completely, probably because of the italics :)
I personally am paranoid enough about ambivalent transhumans that I would be very afraid of giving them such a powerful channel to the outside world, even if they had to pass through many levels of iterative improvement (if I could make a textbook that hijacked your mind, then I could just have you write a similarly destructive textbook to the next researcher, etc.).
I think it is possible to exploit an unfriendly boxed AI to bootstrap to friendliness in a theoretically safe way. Minimally, the problem of doing it safely is very interesting and difficult, and I think I have a solid first step to a solution.
The point of this exercise was to allow the unfriendly AI access not even to the most limited communication it might be able to accomplish using byproducts of computation. Having it produce an unconstrained textbook which a human actually reads is clearly absurd, but I’m not sure what you are trying to mock.
Do you think it is impossible to formulate any question which the AI can answer without exercising undue influence? It seems hard, but I don’t see why you would ridicule the possibility.
In first part of the grandparent, I attempted to strengthen the method. I don’t think automatic checking of a generated AI is anywhere feasible, and so there is little point in discussing that. But a textbook is clearly feasible as means of communicating a solution to a philosophical problem.. And then there are all those obvious old-hat failure modes resulting from the textbook, if only it could be constructed without possibility of causing psychological damage. Maybe we can even improve on that, by having several “generations” of researchers write a textbook for the next one, so that any brittle manipulation patterns contained in the original break.
I’m sorry, I misunderstood your intention completely, probably because of the italics :)
I personally am paranoid enough about ambivalent transhumans that I would be very afraid of giving them such a powerful channel to the outside world, even if they had to pass through many levels of iterative improvement (if I could make a textbook that hijacked your mind, then I could just have you write a similarly destructive textbook to the next researcher, etc.).
I think it is possible to exploit an unfriendly boxed AI to bootstrap to friendliness in a theoretically safe way. Minimally, the problem of doing it safely is very interesting and difficult, and I think I have a solid first step to a solution.