I agree! There’s a distinction between “we know exactly what knowledge is represented in this complicated black box” and “we have formal guarantees about properties of the black box”. It’s indeed very different to say “the AI will have a black box representing a model of human preferences” and “we will train the AI to build a model of human preferences using a bootstrapping schemesuch as HCH, which we believe works because of these strong arguments”.
Perhaps more crisply, we should distinguish between black-boxes where we have a good grasp of why the box will behave as expected, and black boxes which we have little ability to reason about their behavior at all. I believe that both cousin_it and Eliezer (in the Artificial Mysterious Intelligence post), are referring to the folly of using the second type of black box in AI designs.
I agree! There’s a distinction between “we know exactly what knowledge is represented in this complicated black box” and “we have formal guarantees about properties of the black box”. It’s indeed very different to say “the AI will have a black box representing a model of human preferences” and “we will train the AI to build a model of human preferences using a bootstrapping scheme such as HCH, which we believe works because of these strong arguments”.
Perhaps more crisply, we should distinguish between black-boxes where we have a good grasp of why the box will behave as expected, and black boxes which we have little ability to reason about their behavior at all. I believe that both cousin_it and Eliezer (in the Artificial Mysterious Intelligence post), are referring to the folly of using the second type of black box in AI designs.
Perhaps related: Jessica Taylor’s discussion on top-level vs subsystem reasoning.