(Writing at comment-speed, rather than carefully-considered speed, apologies for errors and potential repetitions, etc)
On the Evo-Clown thing and related questions in the Firstly section only.
I think we understand each other on the purpose of the Evo-Clown analogy, and I think it is clear what our disagreement is here in the broader question?
I put in the paragraph Quintin quoted in order to illustrate that, even in an intentionally-absurd example intended to illustrate that A and B share no causal factors, A and B still share clear causal factors, and the fact that A happened this way should give you substantial pause about the prospects for B, versus A never having happened at all and the things that caused A not having happened. I am curious (since Quintin does not comment) whether he agrees about the example, now that I bring up the reasons to be concerned.
The real question is the case of evolution versus AI development.
I got challenged by Quintin and by others as interpreting Quintin too broadly when I said:
That seems like quite a leap. If there is one particular development in humanity’s history that we can fully explain, we should then not cite evolution in any way, as an argument for anything?
In response to Quintin saying:
- THEN, there’s no reason to reference evolution at all when forecasting AI development rates, not as evidence for a sharp left turn, not as an “illustrative example” of some mechanism / intuition which might supposedly lead to a sharp left turn in AI development, not for anything.
I am happy to accept the clarification that I interpreted Quintin’s statement stronger than he intended it.
I still am confused how else I could have interpreted the original statement? But that does not matter, what matters is the disagreements we still clearly do have here.
I now understand Quintin’s model as saying (based on the comment plus his OP) that evolution so obviously does an overdetermined sharp left turn that it isn’t evidence of anything (e.g. that the world I proposed as an alternative breaks so many of his models that it isn’t worth considering)?
I agree that if evolution’s path is sufficiently overdetermined, then there’s no reason to cite that path as evidence. In which case we should instead be discussing the mechanisms that are overdetermining that result, and what they imply.
I think the reason we talk about evolution here is exactly because for most people, the underlying mechanisms very much aren’t obvious and overdetermined before looking at the results—if you skipped over the example people would think you were making a giant leap.
Concrete example 2: One general hypothesis you could have about RL agents is “RL agents just do what they’re trained to do, without any weirdness”. (To be clear, I’m not endorsing this hypothesis. I think it’s much closer to being true than most on LW, but still false.) In the context of AI development, this has pretty benign implications. In the context of evolution, due to the bi-level nature of its optimization process and the different data that different generations are “trained” on, this causal factor in the evolution graph predicts significant divergence between the behaviors of ancestral and modern humans.
Zvi says this is an uncommon standard of epistemics, for there to be no useful inferences from one set of observations (evolutionary outcomes) to another (AI outcomes). I completely disagree. For the vast majority of possible pairs of observations, there are not useful inferences to draw. The pattern of dust specks on my pillow is not a useful reference point for making inferences about the state of the North Korean nuclear weapons program. The relationship between AI development and human evolution is not exceptional in this regard.
Ok, sure. I agree that for any given pair of facts there is essentially nothing to infer from one about the other, given what else we already know, and that the two facts Quintin cites as an example are a valid example. But it seems wrong to say that AI developments and evolutionary developments relate to each other in a similar way or reference class to a speck on your pillow to the nuclear weapons program? Or that the distinctions proposed should generally be of a sufficient degree to imply there are no implications from one to the other?
What I was saying that Quintin is challenging in the second paragraph above, specifically, was not that for observations A and B it would be unusual for A to not have important implications for B. What I was saying was that there being distinctions in the causal graphs behind A and B is not a good reason to dismiss A having implications for B—certainly differences reduce it somewhat, but most of the time that A impacts B, there are important causal graph differences that could draw a similar parallel. And, again, this would strike down most reference class arguments.
Quintin does say there are non-zero implications in the comment, so I suppose the distinction does not much matter in the end. Nor does it much matter whether we are citing evolution, or citing our underlying models that also explain evolution’s outcomes, if we can agree on those models?
As in, we would be better served looking at:
One general hypothesis you could have about RL agents is “RL agents just do what they’re trained to do, without any weirdness.” In the context of AI development, this has pretty benign implications.
I think I kind of… do believe this? For my own perhaps quite weird definitions of ‘weirdness’ and ‘what you train it for’? And for those values, no, this is not benign at all, because I don’t consider SLT behaviors to be weird when you have the capabilities for them. That’s simply what you would expect, including from a human in the same spot, why are we acting so surprised?
If you define ‘weirdness’ sufficiently differently then it would perhaps be benign, but I have no idea why you would expect this.
And also, shouldn’t we use our knowledge of humans here, when faced with similar situations? Humans, a product of evolution, do all sorts of local SLTs in situations far removed from their training data, the moment you give them the affordance to do so and the knowledge that they can.
It is also possible we are using different understandings of SLT, and Quintin is thinking about it more narrowly than I am, as his later statements imply. In that case, I would say that I think the thing I care about, in terms of whether it happens, is the thing (or combination of things) I’m talking about.
Thus, in my view, humans did not do only the one big anti-evolution (?) SLT. Humans are constantly doing SLTs in various contexts, and this is a lot of what I am thinking about in this context.
What prevents there being useful updates from evolution to AI development is the different structure of the causal graphs.
Aha?!?
Quintin, I think (?) is saying that the fact that evolution provided us with a central sharp left turn is not evidence, because that is perfectly compatible with and predicted by AI models that aren’t scary.
So I notice I disagree with this twice.
First, I don’t think that the second because clause entirely holds, for reasons that I largely (but I am guessing not entirely) laid out in my OP, for reasons that I am confident Quintin disagrees with and would take a lot to untangle, although I do agree there is some degree of overdeterminedness here where if we hadn’t done the exact SLT we did but had still ramped up our intelligence, we would have instead done a slightly-to-somewhat different-looking SLT later.
Second, I think this points out a key thing I didn’t say explicitly and should have, which is the distinction between the evidence that humans did all their various SLTs (yes, plural, both collectively and individually), and the evidence that humans did these particular SLTs in these particular ways because of these particular mechanisms. Which I do see as highly relevant.
I can imagine a world where humans did an SLT later in a different way, and are less likely to do them on an individual level (note: I agree that this may be somewhat non-standard usage of SLT, but hopefully it’s mostly clear from context what I’m referring to here?) , and everything happened slower and more continuously (on the margin presumably we can imagine this without our models breaking, if only via different luck). And where we look at the details and say, actually it’s pretty hard to get this kind of thing to happen, and moving humans out of their training distributions causes them to hold up in a way we’d metaphorically like out of AIs really well even when they are smart enough and have enough info and reflection time to know better, and so on.
(EDIT: It’s late, and I’ve now responded in stages to the whole thing, which as Quintin noted was longer than my OP. I’m thankful for the engagement, and will read any further replies, but will do my best to keep any further interactions focused and short so this doesn’t turn into an infinite time sink that it clearly could become, even though it very much isn’t a demon thread or anything.)
(Writing at comment-speed, rather than carefully-considered speed, apologies for errors and potential repetitions, etc)
On the Evo-Clown thing and related questions in the Firstly section only.
I think we understand each other on the purpose of the Evo-Clown analogy, and I think it is clear what our disagreement is here in the broader question?
I put in the paragraph Quintin quoted in order to illustrate that, even in an intentionally-absurd example intended to illustrate that A and B share no causal factors, A and B still share clear causal factors, and the fact that A happened this way should give you substantial pause about the prospects for B, versus A never having happened at all and the things that caused A not having happened. I am curious (since Quintin does not comment) whether he agrees about the example, now that I bring up the reasons to be concerned.
The real question is the case of evolution versus AI development.
I got challenged by Quintin and by others as interpreting Quintin too broadly when I said:
In response to Quintin saying:
I am happy to accept the clarification that I interpreted Quintin’s statement stronger than he intended it.
I still am confused how else I could have interpreted the original statement? But that does not matter, what matters is the disagreements we still clearly do have here.
I now understand Quintin’s model as saying (based on the comment plus his OP) that evolution so obviously does an overdetermined sharp left turn that it isn’t evidence of anything (e.g. that the world I proposed as an alternative breaks so many of his models that it isn’t worth considering)?
I agree that if evolution’s path is sufficiently overdetermined, then there’s no reason to cite that path as evidence. In which case we should instead be discussing the mechanisms that are overdetermining that result, and what they imply.
I think the reason we talk about evolution here is exactly because for most people, the underlying mechanisms very much aren’t obvious and overdetermined before looking at the results—if you skipped over the example people would think you were making a giant leap.
Ok, sure. I agree that for any given pair of facts there is essentially nothing to infer from one about the other, given what else we already know, and that the two facts Quintin cites as an example are a valid example. But it seems wrong to say that AI developments and evolutionary developments relate to each other in a similar way or reference class to a speck on your pillow to the nuclear weapons program? Or that the distinctions proposed should generally be of a sufficient degree to imply there are no implications from one to the other?
What I was saying that Quintin is challenging in the second paragraph above, specifically, was not that for observations A and B it would be unusual for A to not have important implications for B. What I was saying was that there being distinctions in the causal graphs behind A and B is not a good reason to dismiss A having implications for B—certainly differences reduce it somewhat, but most of the time that A impacts B, there are important causal graph differences that could draw a similar parallel. And, again, this would strike down most reference class arguments.
Quintin does say there are non-zero implications in the comment, so I suppose the distinction does not much matter in the end. Nor does it much matter whether we are citing evolution, or citing our underlying models that also explain evolution’s outcomes, if we can agree on those models?
As in, we would be better served looking at:
I think I kind of… do believe this? For my own perhaps quite weird definitions of ‘weirdness’ and ‘what you train it for’? And for those values, no, this is not benign at all, because I don’t consider SLT behaviors to be weird when you have the capabilities for them. That’s simply what you would expect, including from a human in the same spot, why are we acting so surprised?
If you define ‘weirdness’ sufficiently differently then it would perhaps be benign, but I have no idea why you would expect this.
And also, shouldn’t we use our knowledge of humans here, when faced with similar situations? Humans, a product of evolution, do all sorts of local SLTs in situations far removed from their training data, the moment you give them the affordance to do so and the knowledge that they can.
It is also possible we are using different understandings of SLT, and Quintin is thinking about it more narrowly than I am, as his later statements imply. In that case, I would say that I think the thing I care about, in terms of whether it happens, is the thing (or combination of things) I’m talking about.
Thus, in my view, humans did not do only the one big anti-evolution (?) SLT. Humans are constantly doing SLTs in various contexts, and this is a lot of what I am thinking about in this context.
Aha?!?
Quintin, I think (?) is saying that the fact that evolution provided us with a central sharp left turn is not evidence, because that is perfectly compatible with and predicted by AI models that aren’t scary.
So I notice I disagree with this twice.
First, I don’t think that the second because clause entirely holds, for reasons that I largely (but I am guessing not entirely) laid out in my OP, for reasons that I am confident Quintin disagrees with and would take a lot to untangle, although I do agree there is some degree of overdeterminedness here where if we hadn’t done the exact SLT we did but had still ramped up our intelligence, we would have instead done a slightly-to-somewhat different-looking SLT later.
Second, I think this points out a key thing I didn’t say explicitly and should have, which is the distinction between the evidence that humans did all their various SLTs (yes, plural, both collectively and individually), and the evidence that humans did these particular SLTs in these particular ways because of these particular mechanisms. Which I do see as highly relevant.
I can imagine a world where humans did an SLT later in a different way, and are less likely to do them on an individual level (note: I agree that this may be somewhat non-standard usage of SLT, but hopefully it’s mostly clear from context what I’m referring to here?) , and everything happened slower and more continuously (on the margin presumably we can imagine this without our models breaking, if only via different luck). And where we look at the details and say, actually it’s pretty hard to get this kind of thing to happen, and moving humans out of their training distributions causes them to hold up in a way we’d metaphorically like out of AIs really well even when they are smart enough and have enough info and reflection time to know better, and so on.
(EDIT: It’s late, and I’ve now responded in stages to the whole thing, which as Quintin noted was longer than my OP. I’m thankful for the engagement, and will read any further replies, but will do my best to keep any further interactions focused and short so this doesn’t turn into an infinite time sink that it clearly could become, even though it very much isn’t a demon thread or anything.)