I wonder if some people’s aversion to “just answering the question” as Eliezer notes in the comments many times has to do with the perceived cost of signalling agreement with the premises.
It’s straightforward to me that answering should take the question at face value; it’s a thought experiment, you’re not being asked to commit to a course of action. And going by the question as asked the answer for any utilitarian is “torture”, since even a very small increment of suffering multiplied by a large enough number of people (or an infinite number) will outweigh a great amount of suffering by one person.
Signalling that would be highly problematic for some people because of what might be read into our answer—does Eliezer expect that signalling assent here means signalling assent to other, as-yet-unknown conclusions he’s made about (whatever issue where that bears some resemblance)? Does Eliezer intend to codify the terms of this premise into the basis for a decision theory underlying the cognitive architecture of a putative Friendly AI? Does Eliezer think that the real world, in short, maps to his gedankenexperiment sufficiently well that the terms of this scenario can meaningfully stand in for decisions made in that domain by real actors (human or otherwise)?
For my own part I’d be very, very hesitant to signal any of that. Hence I find it difficult to answer the question as asked. It’s analogous to my discomfort with the Ticking Time Bomb scenario—by a straight reading of the premise you should trade a finite chance of finding and disabling the bomb, thereby saving a million lives, for the act of torturing the person who planted it. The logic is internally-consistent, but it doesn’t map to any real-world situation I can plausibly imagine (where torture is not terribly effective in soliciting confessions, and the scenario of a “ticking time bomb with a single suspect unwilling to talk mere minutes beforehand” has AFAIK never happened as presented, and would be extremely difficult to set up).
I recognize the internal consistency, yet I’m troubled by my uncertainty about what the author thinks I’m signing up for when I reply.
I wonder if some people’s aversion to “just answering the question” as Eliezer notes in the comments many times has to do with the perceived cost of signalling agreement with the premises.
It’s straightforward to me that answering should take the question at face value; it’s a thought experiment, you’re not being asked to commit to a course of action. And going by the question as asked the answer for any utilitarian is “torture”, since even a very small increment of suffering multiplied by a large enough number of people (or an infinite number) will outweigh a great amount of suffering by one person.
Signalling that would be highly problematic for some people because of what might be read into our answer—does Eliezer expect that signalling assent here means signalling assent to other, as-yet-unknown conclusions he’s made about (whatever issue where that bears some resemblance)? Does Eliezer intend to codify the terms of this premise into the basis for a decision theory underlying the cognitive architecture of a putative Friendly AI? Does Eliezer think that the real world, in short, maps to his gedankenexperiment sufficiently well that the terms of this scenario can meaningfully stand in for decisions made in that domain by real actors (human or otherwise)?
For my own part I’d be very, very hesitant to signal any of that. Hence I find it difficult to answer the question as asked. It’s analogous to my discomfort with the Ticking Time Bomb scenario—by a straight reading of the premise you should trade a finite chance of finding and disabling the bomb, thereby saving a million lives, for the act of torturing the person who planted it. The logic is internally-consistent, but it doesn’t map to any real-world situation I can plausibly imagine (where torture is not terribly effective in soliciting confessions, and the scenario of a “ticking time bomb with a single suspect unwilling to talk mere minutes beforehand” has AFAIK never happened as presented, and would be extremely difficult to set up).
I recognize the internal consistency, yet I’m troubled by my uncertainty about what the author thinks I’m signing up for when I reply.