johnswentworth comments on The Plan

johnswentworth Dec 15, 2021, 6:09 PM
5 points
To the extent that for all Y so far we’ve found an X, I’m pretty confident that my dream-team H would find X-or-better given a couple of weeks and access to their HCH.
It sounds like roughly this is cruxy.
We’re trying to decide how reliable <some scheme> is at figuring out the right questions to ask in general, and not letting things slip between the cracks in general, and not overlooking unknown unknowns in general, and so forth. Simply observing <the scheme> in action does not give us a useful feedback signal on these questions, unless we already know the answers to the questions. If <the scheme> is not asking the right questions, and we don’t know what the right questions are, then we can’t tell it’s not asking the right questions. If <the scheme> is letting things slip between the cracks, and we don’t know which things to check for crack-slippage, then we can’t tell it’s letting things slip between the cracks. If <the scheme> is overlooking unknown unknowns, and we don’t already know what the unknown unknowns are, then we can’t tell it’s overlooking unknown unknowns.
So: if the dream team cannot figure out beforehand all the things it needs to do to get HCH to avoid these sorts of problems, we should not expect them to figure it out with access to HCH either. Access to HCH does not provide an informative feedback signal unless we already know the answers. The cognitive labor cannot be delegated.
(Interesting side-point: we can make exactly the same argument as above about our own reasoning processes. In that case, unfortunately, we simply can’t do any better; our own reasoning processes are the final line of defense. That’s why a Simulated Long Reflection is special, among these sorts of buck-passing schemes: it is the one scheme which does as well as we would do anyway. As soon as we start to diverge from Simulated Long Reflection, we need to ask whether the divergence will make the scheme more likely to ask the wrong questions, let things slip between cracks, overlook unknown unknowns, etc. In general, we cannot answer this kind of question by observing the scheme itself in operation.)
For complex questions I don’t think you’d have the top-level H immediately divide the question itself: you’d want to avoid this single-point-of-failure.
(This is less cruxy, but it’s a pretty typical/central example of the problems with this whole way of thinking.) By the time the question/problem has been expressed in English, the English expression is already a proxy for the real question/problem.
One of the central skills involved in conceptual research (of the sort I do) is to not accidentally optimize for something we wrote down in English, rather than the concept which that English is trying to express. It’s all too easy to to think that e.g. we need a nice formalization of “knowledge” or “goal directedness” or “abstraction” or what have you, and then come up with some formalization of the English phrase which does not quite match the thing in our head, and which does not quite fit the use-cases which originally generated the line of inquiry.
This is also a major problem in real bureaucracies: the boss can explain the whole problem to the underlings, in a reasonable amount of detail, without attempting to factor it at all, and the underlings are still prone to misunderstand the goal or the use-cases and end up solving the wrong thing. In software engineering, for instance, this happens all the time and is one of the central challenges of the job.