It seems you are actually describing a 3-algorithm stack view for both the human and AGI. For human, there is 1) evolution working on the genome level, there is 2) long-term brain development / learning, and there is 3) the brain solving a particular task. Relatively speaking, evolution (#1) works on much smaller number much more legible parameters than brain development (#2). So if we use some sort of genetic algorithm for optimizing AGI meta-parameters, then we’d get a very stack that is very similar in style. And in any case we need to worry about “base” optimizer used in the AGI version of #1+#2 producing an unaligned mesa-optimizer for AGI version of the #3 algorithm.
I agree that “optimizing AGI meta-parameters” is parallel to evolution in my preferred analogy (see “genome = code”), and I agree that that’s a thing we will probably do. I don’t count it as “evolution analogy” because most of the design work was by a human, and tweaking a handful of human-legible parameters like learning rates and neural architectures does not significantly reduce the human-legibility of the whole algorithm. I agree about unaligned mesa-optimizers. This is not a “We don’t need to worry about AGI safety” article. In fact I have a “Why we’re doomed” article draft in preparation! I’ll probably post it in the next week or two :-) The mesa-optimizers I’m worried about are of the form discussed here. In terms of the brain, you can have an inner alignment problem where “outer” is evolution and “inner” is the whole brain, OR you can have an inner alignment problem where “outer” is the brainstem and “inner” is the neocortex learning algorithm, more or less. I’m concerned about the latter. I don’t think it’s an easier problem, just different. Again, hang on for that forthcoming post.
It seems you are actually describing a 3-algorithm stack view for both the human and AGI. For human, there is 1) evolution working on the genome level, there is 2) long-term brain development / learning, and there is 3) the brain solving a particular task. Relatively speaking, evolution (#1) works on much smaller number much more legible parameters than brain development (#2). So if we use some sort of genetic algorithm for optimizing AGI meta-parameters, then we’d get a very stack that is very similar in style. And in any case we need to worry about “base” optimizer used in the AGI version of #1+#2 producing an unaligned mesa-optimizer for AGI version of the #3 algorithm.
I agree that “optimizing AGI meta-parameters” is parallel to evolution in my preferred analogy (see “genome = code”), and I agree that that’s a thing we will probably do. I don’t count it as “evolution analogy” because most of the design work was by a human, and tweaking a handful of human-legible parameters like learning rates and neural architectures does not significantly reduce the human-legibility of the whole algorithm. I agree about unaligned mesa-optimizers. This is not a “We don’t need to worry about AGI safety” article. In fact I have a “Why we’re doomed” article draft in preparation! I’ll probably post it in the next week or two :-) The mesa-optimizers I’m worried about are of the form discussed here. In terms of the brain, you can have an inner alignment problem where “outer” is evolution and “inner” is the whole brain, OR you can have an inner alignment problem where “outer” is the brainstem and “inner” is the neocortex learning algorithm, more or less. I’m concerned about the latter. I don’t think it’s an easier problem, just different. Again, hang on for that forthcoming post.