Not sure how close you want it to be but how about this example: “animals will typically care about their offspring’s survival and reproduction in worlds where their action space is rich enough for them to be helpful and too rich for them to memorize extremely simple heuristics, because if they didn’t their genes wouldn’t propagate as much”. Not air-tight, and also I knew the stylized fact before I heard the argument so it’s a bit unfair, but I think it’s pretty good as it goes.
I admit I’m a bit surprised by your example. Your example seems to be the type of heuristic argument that, if given about AI, I’d expect would fail to compel many people (including you) on anything approaching a deep level. It’s possible I was just modeling your beliefs incorrectly.
Generally speaking, I suspect there’s a tighter connection between our selection criteria in ML and the stuff models will end up “caring” about relative to the analogous case for natural selection. I think this for similar reasons that Quintin Pope alluded to in his essay about the evolutionary analogy.
If you think you’d be persuaded that animals will end up caring about their offspring because a heuristic argument about that type of behavior being selected for in-distribution, I’m not sure why you’d need a lot of evidence to be convinced the same will be true for AIs with regard to what we train them to care about. But again, perhaps you don’t actually need that much evidence, and I was simply mistaken about what you believe here.
Your example seems to be the type of heuristic argument that, if given about AI, I’d expect would fail to compel many people (including you) on anything approaching a deep level.
I think people are often persuaded of things about AI by heuristic arguments, like “powerful AI will probably be able to reason well and have a decent model of the world because if you don’t do that you can’t achieve good outcomes” (ok that argument needs some tightening, but I think there’s something that works that’s only ~2x as long). I think it’s going to be harder to persuade me of alignment-relevant stuff about AI with this sort of argument, because there are more ways for such arguments to fail IMO—e.g. the evolution argument relies on evolutionary pressure being ongoing.
Two meta points:
There’s arguments that convince me that we had made progress, and there’s arguments that convince me we’ve solved it. It’s easier to get your hands on the first kind than the second.
It’s easier for me to answer gallabytes’ question than yours because I don’t think argument tactics I see are very good, so it’s going to be hard to come up with one that I think is good! The closest that I can come is that “what if we tried to learn values” and “AI safety via debate” felt like steps forward in thought, even tho I don’t think they get very far.
Generally speaking, I suspect there’s a tighter connection between our selection criteria in ML and the stuff models will end up “caring” about relative to the analogous case for natural selection. I think this for similar reasons that Quintin Pope alluded to in his essay about the evolutionary analogy.
For the record I’m not compelled of this enough to be optimistic about alignment, but I’m roughly at my budget for internet discussion/debate right now, so I’ll decline to elaborate.
If you think you’d be persuaded that animals will end up caring about their offspring because a heuristic argument about that type of behavior being selected for in-distribution, I’m not sure why you’d need a lot of evidence to be convinced the same will be true for AIs with regard to what we train them to care about.
Roughly because AI can change the distribution and change the selection pressure that gets applied to it. But also I don’t think I need a lot of evidence in terms of likelihood ratio—my p(doom) is less than 99%, and people convince me of sub-1-in-100 claims all the time—I’m just not seeing the sort of evidence that would move me a lot.
Not sure how close you want it to be but how about this example: “animals will typically care about their offspring’s survival and reproduction in worlds where their action space is rich enough for them to be helpful and too rich for them to memorize extremely simple heuristics, because if they didn’t their genes wouldn’t propagate as much”. Not air-tight, and also I knew the stylized fact before I heard the argument so it’s a bit unfair, but I think it’s pretty good as it goes.
I admit I’m a bit surprised by your example. Your example seems to be the type of heuristic argument that, if given about AI, I’d expect would fail to compel many people (including you) on anything approaching a deep level. It’s possible I was just modeling your beliefs incorrectly.
Generally speaking, I suspect there’s a tighter connection between our selection criteria in ML and the stuff models will end up “caring” about relative to the analogous case for natural selection. I think this for similar reasons that Quintin Pope alluded to in his essay about the evolutionary analogy.
If you think you’d be persuaded that animals will end up caring about their offspring because a heuristic argument about that type of behavior being selected for in-distribution, I’m not sure why you’d need a lot of evidence to be convinced the same will be true for AIs with regard to what we train them to care about. But again, perhaps you don’t actually need that much evidence, and I was simply mistaken about what you believe here.
I think people are often persuaded of things about AI by heuristic arguments, like “powerful AI will probably be able to reason well and have a decent model of the world because if you don’t do that you can’t achieve good outcomes” (ok that argument needs some tightening, but I think there’s something that works that’s only ~2x as long). I think it’s going to be harder to persuade me of alignment-relevant stuff about AI with this sort of argument, because there are more ways for such arguments to fail IMO—e.g. the evolution argument relies on evolutionary pressure being ongoing.
Two meta points:
There’s arguments that convince me that we had made progress, and there’s arguments that convince me we’ve solved it. It’s easier to get your hands on the first kind than the second.
It’s easier for me to answer gallabytes’ question than yours because I don’t think argument tactics I see are very good, so it’s going to be hard to come up with one that I think is good! The closest that I can come is that “what if we tried to learn values” and “AI safety via debate” felt like steps forward in thought, even tho I don’t think they get very far.
For the record I’m not compelled of this enough to be optimistic about alignment, but I’m roughly at my budget for internet discussion/debate right now, so I’ll decline to elaborate.
Roughly because AI can change the distribution and change the selection pressure that gets applied to it. But also I don’t think I need a lot of evidence in terms of likelihood ratio—my p(doom) is less than 99%, and people convince me of sub-1-in-100 claims all the time—I’m just not seeing the sort of evidence that would move me a lot.