OK, well I spend most of my time thinking about a particular AGI architecture (12 etc.) in which the learning algorithm is legible and hand-coded … and let me tell you, in that case, all the problems of AGI safety and alignment are still really really hard, including the “inaccessible information” stuff that Paul was talking about here.
If you’re saying that it would be even worse if, on top of that, the learning algorithm itself is opaque, because it was discovered from a search through algorithm-space … well OK, yeah sure, that does seem even worse.
OK, well I spend most of my time thinking about a particular AGI architecture (1 2 etc.) in which the learning algorithm is legible and hand-coded … and let me tell you, in that case, all the problems of AGI safety and alignment are still really really hard, including the “inaccessible information” stuff that Paul was talking about here.
If you’re saying that it would be even worse if, on top of that, the learning algorithm itself is opaque, because it was discovered from a search through algorithm-space … well OK, yeah sure, that does seem even worse.