So the question is, why do you believe formalization is tractable at all for the AI safety problem?
It’s more that I don’t positively believe it’s not tractable. Some of my reasoning is outlined here, some of it is based on inferences and models that I’m going to distill and post publicly aaaany day now, and mostly it’s an inside-view feel for what problems remain and how hopeless-to-solve they feel.
Which is to say, I can absolutely see how a better AGI paradigm may be locked behind theoretical challenges on the difficulty level of “prove that P≠NP”, and I certainly wouldn’t bet the civilization on solving them in the next five or seven years. But I think it’s worth keeping an eye out for whether e. g. some advanced interpretability tool we invent turns out to have a dual use as a foundation of such a paradigm or puts it within reach.
They assume either infinite computation, or in the regime of bounded Bayesian reasoning/rationality, they assume the ability to solve very difficult problems
Yeah, this is why I added “approximation of” to every “formal” in the summary in my original comment. I have some thoughts on looping in computational complexity into agency theory, but that may not even be necessary.
It’s more that I don’t positively believe it’s not tractable. Some of my reasoning is outlined here, some of it is based on inferences and models that I’m going to distill and post publicly aaaany day now, and mostly it’s an inside-view feel for what problems remain and how hopeless-to-solve they feel.
Which is to say, I can absolutely see how a better AGI paradigm may be locked behind theoretical challenges on the difficulty level of “prove that P≠NP”, and I certainly wouldn’t bet the civilization on solving them in the next five or seven years. But I think it’s worth keeping an eye out for whether e. g. some advanced interpretability tool we invent turns out to have a dual use as a foundation of such a paradigm or puts it within reach.
Yeah, this is why I added “approximation of” to every “formal” in the summary in my original comment. I have some thoughts on looping in computational complexity into agency theory, but that may not even be necessary.