Consider frameworks like the Bayesian probability theory or various decision theories, which (strive to) establish the formally correct algorithms for how systems embedded in a universe larger than themselves must act, even under various uncertainties. How to update on observations, what decisions to make given what information, etc. They still take on “first-person” perspective, they assume that you’re operating on models of reality rather than the reality directly — but they strive to be formally correct given this setup.
Admittedly, this does have 1 big problem, and I’ll list them down below:
Formalization is a pain in the butt, and we have good reasons to believe that formalizing things will be so hard as to essentially be impossible in practice, except in very restricted circumstances. In particular, this is one way in which rationalism and Bayesian reasoning fail to scale down: They assume either infinite computation, or in the regime of bounded Bayesian reasoning/rationality, they assume the ability to solve very difficult problems like NP-complete/Co-NP complete/#P-complete/PSPACE-complete problems or worse. This is generally the reason why formal frameworks don’t work out very well in the real world: Absent oddball assumptions on physics, we probably won’t be able to solve things formally in a lot of cases for forever.
So the question is, why do you believe formalization is tractable at all for the AI safety problem?
So the question is, why do you believe formalization is tractable at all for the AI safety problem?
It’s more that I don’t positively believe it’s not tractable. Some of my reasoning is outlined here, some of it is based on inferences and models that I’m going to distill and post publicly aaaany day now, and mostly it’s an inside-view feel for what problems remain and how hopeless-to-solve they feel.
Which is to say, I can absolutely see how a better AGI paradigm may be locked behind theoretical challenges on the difficulty level of “prove that P≠NP”, and I certainly wouldn’t bet the civilization on solving them in the next five or seven years. But I think it’s worth keeping an eye out for whether e. g. some advanced interpretability tool we invent turns out to have a dual use as a foundation of such a paradigm or puts it within reach.
They assume either infinite computation, or in the regime of bounded Bayesian reasoning/rationality, they assume the ability to solve very difficult problems
Yeah, this is why I added “approximation of” to every “formal” in the summary in my original comment. I have some thoughts on looping in computational complexity into agency theory, but that may not even be necessary.
Admittedly, this does have 1 big problem, and I’ll list them down below:
Formalization is a pain in the butt, and we have good reasons to believe that formalizing things will be so hard as to essentially be impossible in practice, except in very restricted circumstances. In particular, this is one way in which rationalism and Bayesian reasoning fail to scale down: They assume either infinite computation, or in the regime of bounded Bayesian reasoning/rationality, they assume the ability to solve very difficult problems like NP-complete/Co-NP complete/#P-complete/PSPACE-complete problems or worse. This is generally the reason why formal frameworks don’t work out very well in the real world: Absent oddball assumptions on physics, we probably won’t be able to solve things formally in a lot of cases for forever.
So the question is, why do you believe formalization is tractable at all for the AI safety problem?
It’s more that I don’t positively believe it’s not tractable. Some of my reasoning is outlined here, some of it is based on inferences and models that I’m going to distill and post publicly aaaany day now, and mostly it’s an inside-view feel for what problems remain and how hopeless-to-solve they feel.
Which is to say, I can absolutely see how a better AGI paradigm may be locked behind theoretical challenges on the difficulty level of “prove that P≠NP”, and I certainly wouldn’t bet the civilization on solving them in the next five or seven years. But I think it’s worth keeping an eye out for whether e. g. some advanced interpretability tool we invent turns out to have a dual use as a foundation of such a paradigm or puts it within reach.
Yeah, this is why I added “approximation of” to every “formal” in the summary in my original comment. I have some thoughts on looping in computational complexity into agency theory, but that may not even be necessary.