The way I wrote it, I didn’t mean to imply “the designers need to understand the low-K thing for the system to be highly capable”, merely “the low-K thing must appear in the system somewhere for it to be highly capable”. Does the second statement seem right to you?
(perhaps a weaker statement, like “for the system to be highly capable, the low-K thing must be the correct high-level understanding of the system, and so the designers must understand the low-K thing to understand the behavior of the system at a high level”, would be better?)
The second statement seems pretty plausible (when we consider human-accessible AGI designs, at least), but I’m not super confident of it, and I’m not resting my argument on it.
The weaker statement you provide doesn’t seem like it’s addressing my concern. I expect there are ways to get highly capable reasoning (sufficient for, e.g., gaining decisive strategic advantage) without understanding low-K “good reasoning”; the concern is that said systems are much more difficult to align.
The way I wrote it, I didn’t mean to imply “the designers need to understand the low-K thing for the system to be highly capable”, merely “the low-K thing must appear in the system somewhere for it to be highly capable”. Does the second statement seem right to you?
(perhaps a weaker statement, like “for the system to be highly capable, the low-K thing must be the correct high-level understanding of the system, and so the designers must understand the low-K thing to understand the behavior of the system at a high level”, would be better?)
The second statement seems pretty plausible (when we consider human-accessible AGI designs, at least), but I’m not super confident of it, and I’m not resting my argument on it.
The weaker statement you provide doesn’t seem like it’s addressing my concern. I expect there are ways to get highly capable reasoning (sufficient for, e.g., gaining decisive strategic advantage) without understanding low-K “good reasoning”; the concern is that said systems are much more difficult to align.