I don’t understand why a strong simplicity guarantee places most of the difficulty on the learning problem. In the diamond situation, a strong simplicity requirement on the reporter can mean that the direct translator gets ruled out, since it may have to translate from a very large and sophisticated AI predictor?
if automated ontology identification does turn out to be possible from a finite narrow dataset, and if automated ontology identification requires an understanding of our values, then where did the information about our values come from? It did not come from the dataset because we deliberately built a dataset of human answers to objective questions. Where else did it come from?
Perhaps I miss the mystery. My first reaction is, “It came from the assumption of a true decision boundary, and the ability to recursively deploy monotonically better generalization while maintaining conservativeness.”
But an automated ontology identifier that would be guaranteed safe if tasked with extrapolating our concepts still brings up the question of how that guarantee was possible without knowledge of our values. You can’t dodge the puzzle
I feel like this part is getting slippery with how words are used, in a way which is possibly concealing unimagined resolutions to the apparent tension. Why can’t I dodge the puzzle? Why can’t I have an intended ELK reporter which answers the easy questions, and a small subset of the hard questions, without also being able to infinitely recurse an ELK solution to get better and better conservative reporters?
I don’t understand why a strong simplicity guarantee places most of the difficulty on the learning problem. In the diamond situation, a strong simplicity requirement on the reporter can mean that the direct translator gets ruled out, since it may have to translate from a very large and sophisticated AI predictor?
What we’re actually doing is here is defining “automated ontology identification” as an algorithm that only has to work if the predictor computes intermediate results that are sufficiently “close” to what is needed to implement a conservative helpful decision boundary. Because we’re working towards an impossibility result, we wanted to make it as easy as possible for an algorithm to meet the requirements of “automated ontology identification”. If some proposed automated ontology identifier works without the need for any such “sufficiently close intermediate computation” guarantee then it should certainly work in the presence of such a guarantee.
So this “sufficiently close intermediate computation” guarantee kind of changes the learning problem from “find a predictor that predicts well” to “find a predictor that predicts well and also computes intermediate results meeting a certain requirement”. That is a strange requirement to place on a learning process, but it’s actually really hard to see any way to avoid make some such requirement, because if we place no requirement at all then what if the predictor is just a giant lookup table? You might say that such a predictor would not physically fit inside the whole universe, and that’s correct, and is why we wanted to operationalize this “sufficiently close intermediate computation” guarantee, even though it changes the definition of the learning problem in a very important way.
But this is all just to make the definition of “automated ontology identification” not unreasonably difficult, in order that we would not be analyzing a kind of “straw problem”. You could ignore the “sufficiently close intermediate computation” guarantee completely and treat the write-up as analyzing the more difficult problem of automated ontology identification without any guarantee about the intermediate results computed by the predictor.
Perhaps I miss the mystery. My first reaction is, “It came from the assumption of a true decision boundary, and the ability to recursively deploy monotonically better generalization while maintaining conservativeness.”
Well yeah of course but if you don’t think it’s reasonable that any algorithm could meet this requirement then you have to deny exactly one of the three things that you pointed out: the assumption of a true decision boundary, the monotonically better generalization, or the maintenance of conservativeness. I don’t think it’s so easy to pick one of these to deny without also denying the feasibility of automated ontology identification (from a finite narrow dataset, with a safety guarantee).
If you deny the existence of a true decision boundary then you’re saying that there is just no fact of the matter about the questions that we’re asking to automated ontology identification. How then would we get any kind of safety guarantee (conservativeness or anything else)?
If you deny the generalization then you’re saying that there is some easy set E where no matter which prediction problem you solve, there is just no way for the reporter to generalize beyond E given a dataset that is sampled entirely within E, not even by one single case. That’s of course logically possible, but it would mean that automated ontology identification as we have defined it is impossible. You might say “yes but you have defined it wrong”. But if we define it without any generalization guarantee at all then the problem is trivial: an automated ontology identifier can just memorize the dataset and refuse to answer any cases that were not literally present in the dataset. So we need some generalization guarantee. Maybe there is a better one than what we have used.
If you deny the maintenance of conservativeness then, again, that means automated ontology identification as we have defined it is impossible, and again you might say that we have defined it badly. But again, if we remove the conservative requirement completely then we can solve automated ontology identification by just returning random noise. So we need some safety guarantee. I suspect a lot of alternative safety guarantees are going to be susceptible to the same iteration scheme, but again I’m interested in safety guarantees that sidestep this issue.
I feel like this part is getting slippery with how words are used, in a way which is possibly concealing unimagined resolutions to the apparent tension.
Indeed. I also feel that this part is getting slippery with words. The fact that we don’t have a formal impossibility result (or a formal understanding of why this line cannot lead to an impossibility result) indicates that there is further work needed to clarify what’s really going on here.
Why can’t I dodge the puzzle? Why can’t I have an intended ELK reporter which answers the easy questions, and a small subset of the hard questions, without also being able to infinitely recurse an ELK solution to get better and better conservative reporters?
Fundamentally I do think you have to deny one of the 3 points above. That can be done, of course, but it seems to us that none of them are easy to deny without also denying the feasibility of automated ontology identification.
Thanks for these clarifying questions. It’s been helpful to write up this reply.
What we’re actually doing is here is defining “automated ontology identification”
(Flagging that I didn’t understand this part of the reply, but don’t have time to reload context and clarify my confusion right now)
If you deny the existence of a true decision boundary then you’re saying that there is just no fact of the matter about the questions that we’re asking to automated ontology identification. How then would we get any kind of safety guarantee (conservativeness or anything else)?
When you assume a true decision boundary, you’re assuming a label-completion of our intuitions about e.g. diamonds. That’s the whole ball game, no?
But I don’t see why the platonic “true” function has to be total. The solution does not have to be able to answer ambiguous cases like “the diamond is molecularly disassembled and reassembled”, we can leave those unresolved, and let the reporter say “ambiguous.” I might not be able to test for ambiguity-membership, but as long as the ELK solution can:
Know when the instance is easy,
Solve some unambiguous hard instances,
Say “ambiguous” to the rest,
Then a planner—searching for a “Yes, the diamond is safe” plan—can reasonably still end up executing plans which keep the diamond safe. If we want to end up in realities where we’re sure no one is burning in a volcano, that’s fine, even if we can’t label every possible configuration of molecules as a person or not. The planner can just steer into a reality where it unambiguously resolves the question, without worrying about undefined edge-cases.
I don’t understand why a strong simplicity guarantee places most of the difficulty on the learning problem. In the diamond situation, a strong simplicity requirement on the reporter can mean that the direct translator gets ruled out, since it may have to translate from a very large and sophisticated AI predictor?
Perhaps I miss the mystery. My first reaction is, “It came from the assumption of a true decision boundary, and the ability to recursively deploy monotonically better generalization while maintaining conservativeness.”
I feel like this part is getting slippery with how words are used, in a way which is possibly concealing unimagined resolutions to the apparent tension. Why can’t I dodge the puzzle? Why can’t I have an intended ELK reporter which answers the easy questions, and a small subset of the hard questions, without also being able to infinitely recurse an ELK solution to get better and better conservative reporters?
What we’re actually doing is here is defining “automated ontology identification” as an algorithm that only has to work if the predictor computes intermediate results that are sufficiently “close” to what is needed to implement a conservative helpful decision boundary. Because we’re working towards an impossibility result, we wanted to make it as easy as possible for an algorithm to meet the requirements of “automated ontology identification”. If some proposed automated ontology identifier works without the need for any such “sufficiently close intermediate computation” guarantee then it should certainly work in the presence of such a guarantee.
So this “sufficiently close intermediate computation” guarantee kind of changes the learning problem from “find a predictor that predicts well” to “find a predictor that predicts well and also computes intermediate results meeting a certain requirement”. That is a strange requirement to place on a learning process, but it’s actually really hard to see any way to avoid make some such requirement, because if we place no requirement at all then what if the predictor is just a giant lookup table? You might say that such a predictor would not physically fit inside the whole universe, and that’s correct, and is why we wanted to operationalize this “sufficiently close intermediate computation” guarantee, even though it changes the definition of the learning problem in a very important way.
But this is all just to make the definition of “automated ontology identification” not unreasonably difficult, in order that we would not be analyzing a kind of “straw problem”. You could ignore the “sufficiently close intermediate computation” guarantee completely and treat the write-up as analyzing the more difficult problem of automated ontology identification without any guarantee about the intermediate results computed by the predictor.
Well yeah of course but if you don’t think it’s reasonable that any algorithm could meet this requirement then you have to deny exactly one of the three things that you pointed out: the assumption of a true decision boundary, the monotonically better generalization, or the maintenance of conservativeness. I don’t think it’s so easy to pick one of these to deny without also denying the feasibility of automated ontology identification (from a finite narrow dataset, with a safety guarantee).
If you deny the existence of a true decision boundary then you’re saying that there is just no fact of the matter about the questions that we’re asking to automated ontology identification. How then would we get any kind of safety guarantee (conservativeness or anything else)?
If you deny the generalization then you’re saying that there is some easy set E where no matter which prediction problem you solve, there is just no way for the reporter to generalize beyond E given a dataset that is sampled entirely within E, not even by one single case. That’s of course logically possible, but it would mean that automated ontology identification as we have defined it is impossible. You might say “yes but you have defined it wrong”. But if we define it without any generalization guarantee at all then the problem is trivial: an automated ontology identifier can just memorize the dataset and refuse to answer any cases that were not literally present in the dataset. So we need some generalization guarantee. Maybe there is a better one than what we have used.
If you deny the maintenance of conservativeness then, again, that means automated ontology identification as we have defined it is impossible, and again you might say that we have defined it badly. But again, if we remove the conservative requirement completely then we can solve automated ontology identification by just returning random noise. So we need some safety guarantee. I suspect a lot of alternative safety guarantees are going to be susceptible to the same iteration scheme, but again I’m interested in safety guarantees that sidestep this issue.
Indeed. I also feel that this part is getting slippery with words. The fact that we don’t have a formal impossibility result (or a formal understanding of why this line cannot lead to an impossibility result) indicates that there is further work needed to clarify what’s really going on here.
Fundamentally I do think you have to deny one of the 3 points above. That can be done, of course, but it seems to us that none of them are easy to deny without also denying the feasibility of automated ontology identification.
Thanks for these clarifying questions. It’s been helpful to write up this reply.
Thanks for your reply!
(Flagging that I didn’t understand this part of the reply, but don’t have time to reload context and clarify my confusion right now)
When you assume a true decision boundary, you’re assuming a label-completion of our intuitions about e.g. diamonds. That’s the whole ball game, no?
But I don’t see why the platonic “true” function has to be total. The solution does not have to be able to answer ambiguous cases like “the diamond is molecularly disassembled and reassembled”, we can leave those unresolved, and let the reporter say “ambiguous.” I might not be able to test for ambiguity-membership, but as long as the ELK solution can:
Know when the instance is easy,
Solve some unambiguous hard instances,
Say “ambiguous” to the rest,
Then a planner—searching for a “Yes, the diamond is safe” plan—can reasonably still end up executing plans which keep the diamond safe. If we want to end up in realities where we’re sure no one is burning in a volcano, that’s fine, even if we can’t label every possible configuration of molecules as a person or not. The planner can just steer into a reality where it unambiguously resolves the question, without worrying about undefined edge-cases.