Can we somehow make Metaphors We Live By mandatory reading for these people? Reference class tennis plus analogical reasoning is only comforting in the sense that maybe someone stupid enough to be arguing that way isn’t smart enough to build anything dangerous.
[Context: the parent comment was originally posted to the Alignment Forum, and was moved to only be visible on LW.]
One of my hopes for the Alignment Forum, and to a much lesser extent LessWrong, is that we manage to be a place where everyone relevant to AI alignment gets value from discussing their work. There’s many obstacles to that, but one of the ones that I’ve been thinking a lot recently is that pointing at foundational obstacles can look a lot like low-effort criticism.
That is, I think there’s a valid objection here of the form “these people are using reasoning style A, but I think this problem calls for reasoning style B because of considerations C, D, and E.” But the inferential distance here is actually quite long, and it’s much easier to point out “I am not convinced by this because of <quick pointer>” than it is to actually get the other person to agree that they were making a mistake. And beyond that, there’s the version that scores points off an ingroup/outgroup divide and a different version that tries to convert the other party.
My sense is that lots of technical AI safety agendas look to each other like they have foundational obstacles, of the sort that means having more than one agenda happy at the Alignment Forum means everyone needs to not do this sort of sniping, while still having high-effort places to discuss those obstacles. (That is, if we think CIRL can’t handle corrigibility, having a place for ‘obstacles to CIRL’ where that’s discussed makes sense, but bringing it up at every post on CIRL might not.)
whoops, I agree with the heuristic and didn’t actually mean for it to go to AF instead of LW. Hadn’t paid too much attention to how crossposting works until now.
I agree with the wisdom of removing the comment from AF, but I admit I was also screaming internally while reading the article.
(From a personal perspective, ignoring the issue of artificial intelligent and existential risks, this was an interesting look outside the LW bubble. Like, the more time passed since when I read the Sequences, the more the ideas explained there seem obvious to me, to the point where I start to wonder why was I even impressed by reading the text. But then I listen to someone from outside the bubble, and scream internally as I watch them doing the “obvious” mistakes—typically some variant of confusing a map with the territory—and then I realize the “obvious” things are actually not that obvious, even among highly intelligent people who talk about topics they care about. Afterwards, I just silently weep about the state of the human race.)
It hurts to read a sophisticated version of “humans are too smart to make mistakes”. But pointing it out without crossing the entire inferential distance is not really helpful. :(
Meta: This is in response to both this and comments further up the chain regarding the level of the debate.
It’s worth noting that, at least from my perspective, Bengio, who’s definitely not in the LW bubble, made good points throughout and did a good job of moderating.
On the other hand, Russell, obviously more partial to the LW consensus view, threw out some “zingers” early on (such as the following one) that didn’t derail the debate but easily could’ve.
Thanks for clearing that up—so 2+2 is not equal to 4, because if the 2 were a 3, the answer wouldn’t be 4? I simply pointed out that in the MDP as I defined it, switching off the human is the optimal solution, despite the fact that we didn’t put in any emotions of power, domination, hate, testosterone, etc etc. And your solution seems, well, frankly terrifying, although I suppose the NRA would approve.
[internal screaming intensifies]
Can we somehow make Metaphors We Live By mandatory reading for these people? Reference class tennis plus analogical reasoning is only comforting in the sense that maybe someone stupid enough to be arguing that way isn’t smart enough to build anything dangerous.
[Context: the parent comment was originally posted to the Alignment Forum, and was moved to only be visible on LW.]
One of my hopes for the Alignment Forum, and to a much lesser extent LessWrong, is that we manage to be a place where everyone relevant to AI alignment gets value from discussing their work. There’s many obstacles to that, but one of the ones that I’ve been thinking a lot recently is that pointing at foundational obstacles can look a lot like low-effort criticism.
That is, I think there’s a valid objection here of the form “these people are using reasoning style A, but I think this problem calls for reasoning style B because of considerations C, D, and E.” But the inferential distance here is actually quite long, and it’s much easier to point out “I am not convinced by this because of <quick pointer>” than it is to actually get the other person to agree that they were making a mistake. And beyond that, there’s the version that scores points off an ingroup/outgroup divide and a different version that tries to convert the other party.
My sense is that lots of technical AI safety agendas look to each other like they have foundational obstacles, of the sort that means having more than one agenda happy at the Alignment Forum means everyone needs to not do this sort of sniping, while still having high-effort places to discuss those obstacles. (That is, if we think CIRL can’t handle corrigibility, having a place for ‘obstacles to CIRL’ where that’s discussed makes sense, but bringing it up at every post on CIRL might not.)
whoops, I agree with the heuristic and didn’t actually mean for it to go to AF instead of LW. Hadn’t paid too much attention to how crossposting works until now.
I agree with the wisdom of removing the comment from AF, but I admit I was also screaming internally while reading the article.
(From a personal perspective, ignoring the issue of artificial intelligent and existential risks, this was an interesting look outside the LW bubble. Like, the more time passed since when I read the Sequences, the more the ideas explained there seem obvious to me, to the point where I start to wonder why was I even impressed by reading the text. But then I listen to someone from outside the bubble, and scream internally as I watch them doing the “obvious” mistakes—typically some variant of confusing a map with the territory—and then I realize the “obvious” things are actually not that obvious, even among highly intelligent people who talk about topics they care about. Afterwards, I just silently weep about the state of the human race.)
It hurts to read a sophisticated version of “humans are too smart to make mistakes”. But pointing it out without crossing the entire inferential distance is not really helpful. :(
Meta: This is in response to both this and comments further up the chain regarding the level of the debate.
It’s worth noting that, at least from my perspective, Bengio, who’s definitely not in the LW bubble, made good points throughout and did a good job of moderating.
On the other hand, Russell, obviously more partial to the LW consensus view, threw out some “zingers” early on (such as the following one) that didn’t derail the debate but easily could’ve.