In the sorting problem, suppose you applied your advanced interpretability techniques, and got a design with documentation.
You also apply a different technique, and get code with formal proof that it sorts.
In the latter case, you can be sure that the code works, even if you can’t understand it.
The algorithm+formal proof approach works whenever you have a formal success criteria.
It is less clear how well the design approach works on a problem where you can’t write formal success criteria so easily.
Here is a task that neural nets have been made to do, convert pictures of horses into similar pictures of zebras. https://youtu.be/D4C1dB9UheQ?t=72. I am unsure if a designed solution to this problem exists.
Imagine that you give a bunch of smart programmers a lecture on how to solve this problem, and then they have to implement a solution without access to any source of horse or zebra pictures. I suspect they would fail. I would suspect that solving this problem well fundamentally requires a significant amount of information about horses and zebras. I suspect that the amount of info required is more than a human can understand and conceptualize at once. The human will be able to understand each small part of the system, but logic gates are understandable, so that must hold for any system. The human can understand why it works in the abstract, the way we understand gradient decent over neural nets.
I am not sure that this problem has a core insight that is possessable, but not possessed by us.
In the sorting problem, suppose you applied your advanced interpretability techniques, and got a design with documentation.
You also apply a different technique, and get code with formal proof that it sorts.
In the latter case, you can be sure that the code works, even if you can’t understand it.
The algorithm+formal proof approach works whenever you have a formal success criteria.
It is less clear how well the design approach works on a problem where you can’t write formal success criteria so easily.
Here is a task that neural nets have been made to do, convert pictures of horses into similar pictures of zebras. https://youtu.be/D4C1dB9UheQ?t=72. I am unsure if a designed solution to this problem exists.
Imagine that you give a bunch of smart programmers a lecture on how to solve this problem, and then they have to implement a solution without access to any source of horse or zebra pictures. I suspect they would fail. I would suspect that solving this problem well fundamentally requires a significant amount of information about horses and zebras. I suspect that the amount of info required is more than a human can understand and conceptualize at once. The human will be able to understand each small part of the system, but logic gates are understandable, so that must hold for any system. The human can understand why it works in the abstract, the way we understand gradient decent over neural nets.
I am not sure that this problem has a core insight that is possessable, but not possessed by us.