I can never tell whether they’ve never thought about the things I’m thinking about, or whether they sped past them years ago. They do seem very smart, that’s for sure.
Whenever I have a great idea, it turns out that someone at MIRI considered it five years earlier. This simultaneously makes me feel very smart and rather dissapointed. With that being said, here are some relevant things:
Thing #1:
Oh, you’re so very clever! By now you’ve realized you need, above and beyond your regular decision procedure to guide your actions in the outside world, a “meta-decision-procedure” to guide your own decision-procedure-improvement efforts.
This is a nitpick but it’s an important one in understanding how meta-stuff works here: If you’ve decided that you need a decision procedure to decide when to update your decision procedure, then whatever algorithm you used to make that decision is already meta. This is because your decision procedure is thinking self-referentially. Given this, why would it need to build a whole new procedure for thinking about decision procedures when it could just improve itself?
This has a number of advantages because it means that anything you learn about how to make decisions can also be directly used to help you make decisions about how you make decisions—ad infinitum.
Thing #2:
You are a business. You do retrospectives on your projects. You’re so very clever, in fact, that you do retrospectives on your retrospective process, to improve it over time. But how do you improve these retro-retros? You don’t. They’re in your boundary.
You need to justify that your projects are good so you do retrospectives. But you need to justify why your retrospectives are good so you do retrospectives on those. But you need to justify why your retro-retros are good too right? To quote Eliezer:
Should I trust my brain? Obviously not; it doesn’t always work. But nonetheless, the human brain seems much more powerful than the most sophisticated computer programs I could consider trusting otherwise. How well does my brain work in practice, on which sorts of problems?
So there are a couple questions here. The easy question:
Q: How do I justify the way I’m investing my resources?
A: You don’t. You just invest them using the best of your ability and hope for the best
And the more interesting question:
Q: What is the optimal level of meta-justification I use in investing my resources?
A1: This still isn’t technically knowable information. However, there are plenty of unjustified priors that might be built into you which cause you to make a decision. For instance, you might keep going up the meta-levels enough rounds until you see diminishing returns and then stop. Or you might just never go above three levels of meta because you figure that’s excessive. Depends on the AI.
A2: Given that Thing #1 is true, you don’t need any meta-decision algorithms—you just need a self-referential decision algorithm. In this case, we just have the answer to the easy question: You use the full capabilities of your decision algorithm and hope for the best (and sometimes your decision algorithm makes decisions about itself instead of decisions about physical actions)
I don’t understand Thing #1. Perhaps, in the passage you quote from my post, the phrase “decision procedure” sounds misleadingly generic, as if I have some single function I use to make all my decisions (big and small) and we are talking about modifications to that function.
(I don’t think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as “decisions” that aren’t being made using the whole function, which contradicts the original premise.)
Instead, I’m just talking about the ordinary sort of case where you shift some resources away from doing X to thinking about better ways to do X, where X isn’t the whole of everything you do.
Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, “I’m managing the recursion to the best of my ability” can mean something better than just “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary.” But then this seems to threaten the Embedded Agency programme, because it would mean we can’t make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
(The discussion of self-justification in this post was originally motivated by the topic of external assessment, on the premise that if we are powerful enough to assess a proposed AGI in a given way, it must also be powerful enough to assess itself in that way. And contrapositively, if the AGI can’t assess itself in a given way then we can’t assess it in that way either.)
(I don’t think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as “decisions” that aren’t being made using the whole function, which contradicts the original premise.)
Even if the decision function has a lot of sub-structure, I think that in the context of AGI
(less important point) It is unlikely that we will be able to directly separate substructures of the function from the whole function. This is because I’m assuming the function is using some heuristic approximating logical induction to think about itself and this has extremely broad uses across basically every aspect of the function.
(more important point) It doesn’t matter if it’s a sub-structure or not. The point is that some part of the decision function is already capable of reasoning about either improving itself or about improving other aspects of the decision function. So whatever method it uses to anticipate whether it should try self-improvement is already baked-in in some way.
Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, “I’m managing the recursion to the best of my ability” can mean something better than just “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary.” But then this seems to threaten the Embedded Agency programme, because it would mean we can’t make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
So “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary” is not exactly true because, in all relevant scenarios, we’re the ones who build the AI. It’s more like “So I work on exactly N levels and then my decisions at level N+1 were deemed irrelevant by the selection pressures that created me which granted me this decision-function that deemed further levels irrelevant.”
If we’re okay with leveraging normative or empirical assumptions about the world, we should be able to assess AGI (or have the AGI assess itself) with methods that we’re comfortable with.
In some sense, we have practical examples of what this looks like. N, the level of meta, can be viewed as a hyperparameter of our learning system. However, in data science, hyperparameters perform differently for different problems so people often use Bayesian optimization to iteratively pick the best hyperparameters. But, you might say, our Bayesian hyperparameter optimization process requires its own priors—it too has hyperparameters!
But no one really bothers to optimize these for a couple reasons--
#1. As we increase the level of meta in a particular optimization process, we tend to see diminishing returns on the improved model performance
#2. Meta-optimization is prohibitively expensive: Each N-level meta-optimizer generally needs to consider multiple possibilities of (N-1)-level optimizers in order to pick the best one. Inductively, this means your N-level meta-optimizer’s computational cost is around xN where x represents the number of (N-1)-level optimizers each N-level optimizer needs to consider.
But #1. can’t actually be proved. It’s just an assumptiont that we think is true because we have a strong observational prior for it being true. Maybe we should question how human brains generate their priors but, at the end of the day, the way we do this questioning is still determined by our hard-coded algorithms for dealing with probability.
The upshot is that, when we look at problems to the one similar we face with embedded agency, we still use the Eliezer-an approach. We just happen to be very confident in our boundary for reasons that cannot be rigorously justified.
I don’t understand your argument for why #1 is impossible. Consider a universe that’ll undergo heat death in a billion steps. Consider the agent that implements “Take an action if PA+<steps remaining> can prove that it is good.” using some provability checker algorithm that takes some steps to run. If there is some faster provability checker algorithm, it’s provable that it’ll do better using that one, so it switches when it finds that proof.
Whenever I have a great idea, it turns out that someone at MIRI considered it five years earlier. This simultaneously makes me feel very smart and rather dissapointed. With that being said, here are some relevant things:
Thing #1:
This is a nitpick but it’s an important one in understanding how meta-stuff works here: If you’ve decided that you need a decision procedure to decide when to update your decision procedure, then whatever algorithm you used to make that decision is already meta. This is because your decision procedure is thinking self-referentially. Given this, why would it need to build a whole new procedure for thinking about decision procedures when it could just improve itself?
This has a number of advantages because it means that anything you learn about how to make decisions can also be directly used to help you make decisions about how you make decisions—ad infinitum.
Thing #2:
This case reminded me a lot of Eliezer on Where Recursive Justification Hits Rock Bottom except placed in a context where you can modify your level of recursion.
You need to justify that your projects are good so you do retrospectives. But you need to justify why your retrospectives are good so you do retrospectives on those. But you need to justify why your retro-retros are good too right? To quote Eliezer:
So there are a couple questions here. The easy question:
Q: How do I justify the way I’m investing my resources?
A: You don’t. You just invest them using the best of your ability and hope for the best
And the more interesting question:
Q: What is the optimal level of meta-justification I use in investing my resources?
A1: This still isn’t technically knowable information. However, there are plenty of unjustified priors that might be built into you which cause you to make a decision. For instance, you might keep going up the meta-levels enough rounds until you see diminishing returns and then stop. Or you might just never go above three levels of meta because you figure that’s excessive. Depends on the AI.
A2: Given that Thing #1 is true, you don’t need any meta-decision algorithms—you just need a self-referential decision algorithm. In this case, we just have the answer to the easy question: You use the full capabilities of your decision algorithm and hope for the best (and sometimes your decision algorithm makes decisions about itself instead of decisions about physical actions)
I don’t understand Thing #1. Perhaps, in the passage you quote from my post, the phrase “decision procedure” sounds misleadingly generic, as if I have some single function I use to make all my decisions (big and small) and we are talking about modifications to that function.
(I don’t think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as “decisions” that aren’t being made using the whole function, which contradicts the original premise.)
Instead, I’m just talking about the ordinary sort of case where you shift some resources away from doing X to thinking about better ways to do X, where X isn’t the whole of everything you do.
Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, “I’m managing the recursion to the best of my ability” can mean something better than just “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary.” But then this seems to threaten the Embedded Agency programme, because it would mean we can’t make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
(The discussion of self-justification in this post was originally motivated by the topic of external assessment, on the premise that if we are powerful enough to assess a proposed AGI in a given way, it must also be powerful enough to assess itself in that way. And contrapositively, if the AGI can’t assess itself in a given way then we can’t assess it in that way either.)
Even if the decision function has a lot of sub-structure, I think that in the context of AGI
(less important point) It is unlikely that we will be able to directly separate substructures of the function from the whole function. This is because I’m assuming the function is using some heuristic approximating logical induction to think about itself and this has extremely broad uses across basically every aspect of the function.
(more important point) It doesn’t matter if it’s a sub-structure or not. The point is that some part of the decision function is already capable of reasoning about either improving itself or about improving other aspects of the decision function. So whatever method it uses to anticipate whether it should try self-improvement is already baked-in in some way.
So “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary” is not exactly true because, in all relevant scenarios, we’re the ones who build the AI. It’s more like “So I work on exactly N levels and then my decisions at level N+1 were deemed irrelevant by the selection pressures that created me which granted me this decision-function that deemed further levels irrelevant.”
If we’re okay with leveraging normative or empirical assumptions about the world, we should be able to assess AGI (or have the AGI assess itself) with methods that we’re comfortable with.
In some sense, we have practical examples of what this looks like. N, the level of meta, can be viewed as a hyperparameter of our learning system. However, in data science, hyperparameters perform differently for different problems so people often use Bayesian optimization to iteratively pick the best hyperparameters. But, you might say, our Bayesian hyperparameter optimization process requires its own priors—it too has hyperparameters!
But no one really bothers to optimize these for a couple reasons--
#1. As we increase the level of meta in a particular optimization process, we tend to see diminishing returns on the improved model performance
#2. Meta-optimization is prohibitively expensive: Each N-level meta-optimizer generally needs to consider multiple possibilities of (N-1)-level optimizers in order to pick the best one. Inductively, this means your N-level meta-optimizer’s computational cost is around xN where x represents the number of (N-1)-level optimizers each N-level optimizer needs to consider.
But #1. can’t actually be proved. It’s just an assumptiont that we think is true because we have a strong observational prior for it being true. Maybe we should question how human brains generate their priors but, at the end of the day, the way we do this questioning is still determined by our hard-coded algorithms for dealing with probability.
The upshot is that, when we look at problems to the one similar we face with embedded agency, we still use the Eliezer-an approach. We just happen to be very confident in our boundary for reasons that cannot be rigorously justified.
I don’t understand your argument for why #1 is impossible. Consider a universe that’ll undergo heat death in a billion steps. Consider the agent that implements “Take an action if PA+<steps remaining> can prove that it is good.” using some provability checker algorithm that takes some steps to run. If there is some faster provability checker algorithm, it’s provable that it’ll do better using that one, so it switches when it finds that proof.