That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you’d end up with.
It’s how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don’t need friendliness anyway.
That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you’d end up with.
I don’t see why the search algorithm would need to be self modifying.
I don’t see why you would be searching for stability as opposed to friendliNess. Human testers can judge friendliness directly.
It’s how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don’t need friendliness anyway.
If I draw a box around the selection algorithm and find there is nothing self modifying inside …where’s the circularity?