I agree that “optimizing AGI meta-parameters” is parallel to evolution in my preferred analogy (see “genome = code”), and I agree that that’s a thing we will probably do. I don’t count it as “evolution analogy” because most of the design work was by a human, and tweaking a handful of human-legible parameters like learning rates and neural architectures does not significantly reduce the human-legibility of the whole algorithm. I agree about unaligned mesa-optimizers. This is not a “We don’t need to worry about AGI safety” article. In fact I have a “Why we’re doomed” article draft in preparation! I’ll probably post it in the next week or two :-) The mesa-optimizers I’m worried about are of the form discussed here. In terms of the brain, you can have an inner alignment problem where “outer” is evolution and “inner” is the whole brain, OR you can have an inner alignment problem where “outer” is the brainstem and “inner” is the neocortex learning algorithm, more or less. I’m concerned about the latter. I don’t think it’s an easier problem, just different. Again, hang on for that forthcoming post.
I agree that “optimizing AGI meta-parameters” is parallel to evolution in my preferred analogy (see “genome = code”), and I agree that that’s a thing we will probably do. I don’t count it as “evolution analogy” because most of the design work was by a human, and tweaking a handful of human-legible parameters like learning rates and neural architectures does not significantly reduce the human-legibility of the whole algorithm. I agree about unaligned mesa-optimizers. This is not a “We don’t need to worry about AGI safety” article. In fact I have a “Why we’re doomed” article draft in preparation! I’ll probably post it in the next week or two :-) The mesa-optimizers I’m worried about are of the form discussed here. In terms of the brain, you can have an inner alignment problem where “outer” is evolution and “inner” is the whole brain, OR you can have an inner alignment problem where “outer” is the brainstem and “inner” is the neocortex learning algorithm, more or less. I’m concerned about the latter. I don’t think it’s an easier problem, just different. Again, hang on for that forthcoming post.