What I have to offer is yet another informal perspective, but one that may further the search for formal approaches. The structure of the inner alignment problem is isomorphic to the problem of cancer. Cancer can be considered a state in which a cell employs a strategy which is not aligned with that of the organism or organ of which it belongs. One might expect, then, that advances in cancer research will offer solutions which can be translated in terms of AI alignment. In order for this to work, one would have to construct a dictionary to facilitate the process.
A major benefit of this approach would be the ability to leverage the efforts of some of the greatest scientists of our time working on solving a problem that is considered to be of high priority. Cancer research gets massive funding. Alignment research does not. If the problem structure is at least partly isomorphic, translation should be both possible and beneficial.
Personally, I think the cancer analogy is ok, but I strongly predict that cancer treatment/prevention won’t provide good inspiration for inner alignment.
For example, we can already conceive of the idea of scanning for mesa-optimization and surgically removing it (we don’t need any analogy for that), but we don’t know how to do it, and details of medical scans and radiation therapy etc don’t seem usefully analogous.
I think you should be careful to not mix an analogy and an isomorphism. I agree that there is a pretty natural analogy with the cancer case, but it falls far short of an isomorphism at the moment. You don’t have an argument to say that the mechanism used by cancer cells are similar to those creating mesa-optimizers, that the process creating them is similar, etc
I’m not saying that such a lower level correspondence doesn’t exist. Just that saying “Look, the very general idea is similar” is not a strong enough argument for such a correspondence.
All analogies rely on isomorphisms. They simply refer to shared patterns. A good analogy captures many structural regularities that are shared between two different things while a bad one captures only a few.
The field of complex adaptive systems (CADs) is dedicated to the study of structural regularities between various systems operating under similar constraints. Ant colony optimization and simulated annealing can be used to solve an extremely wide range of problems because there are many structural regularities to CADs.
I worry that a myopic focus will result in a lot of time wasted on lines of inquiry that have parallels in a number of different fields. If we accept that the problem of inner alignment can be formalized, it would be very surprising to find that the problem is unique in the sense that it has no parallels in nature. Especially considering the obvious general analogy to the problem of cancer which may or may not provide insight to the alignment problem.
What I have to offer is yet another informal perspective, but one that may further the search for formal approaches. The structure of the inner alignment problem is isomorphic to the problem of cancer. Cancer can be considered a state in which a cell employs a strategy which is not aligned with that of the organism or organ of which it belongs. One might expect, then, that advances in cancer research will offer solutions which can be translated in terms of AI alignment. In order for this to work, one would have to construct a dictionary to facilitate the process.
A major benefit of this approach would be the ability to leverage the efforts of some of the greatest scientists of our time working on solving a problem that is considered to be of high priority. Cancer research gets massive funding. Alignment research does not. If the problem structure is at least partly isomorphic, translation should be both possible and beneficial.
Personally, I think the cancer analogy is ok, but I strongly predict that cancer treatment/prevention won’t provide good inspiration for inner alignment.
For example, we can already conceive of the idea of scanning for mesa-optimization and surgically removing it (we don’t need any analogy for that), but we don’t know how to do it, and details of medical scans and radiation therapy etc don’t seem usefully analogous.
I think you should be careful to not mix an analogy and an isomorphism. I agree that there is a pretty natural analogy with the cancer case, but it falls far short of an isomorphism at the moment. You don’t have an argument to say that the mechanism used by cancer cells are similar to those creating mesa-optimizers, that the process creating them is similar, etc
I’m not saying that such a lower level correspondence doesn’t exist. Just that saying “Look, the very general idea is similar” is not a strong enough argument for such a correspondence.
All analogies rely on isomorphisms. They simply refer to shared patterns. A good analogy captures many structural regularities that are shared between two different things while a bad one captures only a few.
The field of complex adaptive systems (CADs) is dedicated to the study of structural regularities between various systems operating under similar constraints. Ant colony optimization and simulated annealing can be used to solve an extremely wide range of problems because there are many structural regularities to CADs.
I worry that a myopic focus will result in a lot of time wasted on lines of inquiry that have parallels in a number of different fields. If we accept that the problem of inner alignment can be formalized, it would be very surprising to find that the problem is unique in the sense that it has no parallels in nature. Especially considering the obvious general analogy to the problem of cancer which may or may not provide insight to the alignment problem.