I focus mostly on formal properties algorithms can or cannot have, rather than the algorithms themselves. So, from my point of view, it doesn’t matter whether the prior is “explicit” and I doubt it’s even a well-defined question. What I mean by “prior” is, more or less, whatever probability measure has the best Bayesian regret bound for the given RL algorithm.
I think the prior will have to look somewhat like the universal prior. Occam’s razor is a foundational principle of rationality, and any reasonable algorithm should have inductive bias towards simpler hypotheses. I think there’s even some work trying to prove that deep learning already has such inductive bias. At the same time, the space of hypotheses has to be very rich (although still constrained by computational resources and some additional structural assumptions needed to make learning feasible).
I think that DRL doesn’t require a prior (or, more generally, algorithmic building blocks) substantially different from what is needed for capabilities, since if your algorithm is superintelligent (in the sense that, it’s relevant to either causing or mitigating X-risk) then it has to create sophisticated models of the world that include people, among other things, and therefore forcing it to model the advisor as well doesn’t make the task substantially harder (well, it is harder in the sense that the regret bound is weaker, but that is not because of the prior).
I focus mostly on formal properties algorithms can or cannot have, rather than the algorithms themselves. So, from my point of view, it doesn’t matter whether the prior is “explicit” and I doubt it’s even a well-defined question. What I mean by “prior” is, more or less, whatever probability measure has the best Bayesian regret bound for the given RL algorithm.
I think the prior will have to look somewhat like the universal prior. Occam’s razor is a foundational principle of rationality, and any reasonable algorithm should have inductive bias towards simpler hypotheses. I think there’s even some work trying to prove that deep learning already has such inductive bias. At the same time, the space of hypotheses has to be very rich (although still constrained by computational resources and some additional structural assumptions needed to make learning feasible).
I think that DRL doesn’t require a prior (or, more generally, algorithmic building blocks) substantially different from what is needed for capabilities, since if your algorithm is superintelligent (in the sense that, it’s relevant to either causing or mitigating X-risk) then it has to create sophisticated models of the world that include people, among other things, and therefore forcing it to model the advisor as well doesn’t make the task substantially harder (well, it is harder in the sense that the regret bound is weaker, but that is not because of the prior).