I don’t have to know in advance that we’re in hard-takeoff singleton world, or even that my AI will succeed to achieve those objectives. The only thing I absolutely have to know in advance is that my AI is aligned. What sort of evidence will I have for this? A lot of detailed mathematical theory, with the modeling assumptions validated by computational experiments and knowledge from other fields of science (e.g. physics, cognitive science, evolutionary biology).
I think you’re misinterpreting Yudkowsky’s quote. “Using the null string as input” doesn’t mean “without evidence”, it means “without other people telling me parts of the answer (to this particular question)”.
I’m not sure what is “extremely destructive and costly” in what I described? Unless you mean the risk of misalignment, in which case, see above.
The critics tend to assume slow-takeoff multipole scenarios, which makes the comparison with their preferred solutions to be somewhat “apples and oranges”. Suppose that we do live in a hard-takeoff singleton world, what then?
It sounds like you do in fact believe we are in a hard-takeoff singleton world, or at least one in which a single actor can permanently prevent all other actors from engaging in catastrophic actions using a less destructive approach than “do unto others before they can do unto you”. Why do you think that describes the world we live in? What observations led you to that conclusion, and do you think others would come to the same conclusion if they saw the same evidence?
I think your set of guidelines from above is mostly[1] a good one, in worlds where a single actor can seize control while following those rules. I don’t think that we live in such a world, and honestly I can’t really imagine what sort of evidence would convince me that I do live in such a world though. Which is why I’m asking.
I think you’re misinterpreting Yudkowsky’s quote. “Using the null string as input” doesn’t mean “without evidence”, it means “without other people telling me parts of the answer (to this particular question)”.
Yeah, on examination of the comment section I think you’re right that by “from the null string” he meant “without direct social inputs on this particular topic”.
“Commit to not make anyone predictably regret supporting the project or not opposing it” is worrying only by omission—it’s a good guideline, but it leaves the door open for “punish anyone who failed to support the project once the project gets the power to do so”. To see why that’s a bad idea to allow, consider the situation where there are two such projects and you, the bystander, don’t know which one will succeed first.
I don’t know whether we live in a hard-takeoff singleton world or not. I think there is some evidence in that direction, e.g. from thinking about the kind of qualitative changes in AI algorithms that might come about in the future, and their implications on the capability growth curve, and also about the possibility of recursive self-improvement. But, the evidence is definitely far from conclusive (in any direction).
I think that the singleton world is definitely likely enough to merit some consideration. I also think that some of the same principles apply to some multipole worlds.
Commit to not make anyone predictably regret supporting the project or not opposing it” is worrying only by omission—it’s a good guideline, but it leaves the door open for “punish anyone who failed to support the project once the project gets the power to do so”.
Yes, I never imagined doing such a thing, but I definitely agree it should be made clear. Basically, don’t make threats, i.e. don’t try to shape others incentives in ways that they would be better off precommitting not to go along with it.
I don’t have to know in advance that we’re in hard-takeoff singleton world, or even that my AI will succeed to achieve those objectives. The only thing I absolutely have to know in advance is that my AI is aligned. What sort of evidence will I have for this? A lot of detailed mathematical theory, with the modeling assumptions validated by computational experiments and knowledge from other fields of science (e.g. physics, cognitive science, evolutionary biology).
I think you’re misinterpreting Yudkowsky’s quote. “Using the null string as input” doesn’t mean “without evidence”, it means “without other people telling me parts of the answer (to this particular question)”.
I’m not sure what is “extremely destructive and costly” in what I described? Unless you mean the risk of misalignment, in which case, see above.
This was specifically in response to
It sounds like you do in fact believe we are in a hard-takeoff singleton world, or at least one in which a single actor can permanently prevent all other actors from engaging in catastrophic actions using a less destructive approach than “do unto others before they can do unto you”. Why do you think that describes the world we live in? What observations led you to that conclusion, and do you think others would come to the same conclusion if they saw the same evidence?
I think your set of guidelines from above is mostly[1] a good one, in worlds where a single actor can seize control while following those rules. I don’t think that we live in such a world, and honestly I can’t really imagine what sort of evidence would convince me that I do live in such a world though. Which is why I’m asking.
Yeah, on examination of the comment section I think you’re right that by “from the null string” he meant “without direct social inputs on this particular topic”.
“Commit to not make anyone predictably regret supporting the project or not opposing it” is worrying only by omission—it’s a good guideline, but it leaves the door open for “punish anyone who failed to support the project once the project gets the power to do so”. To see why that’s a bad idea to allow, consider the situation where there are two such projects and you, the bystander, don’t know which one will succeed first.
I don’t know whether we live in a hard-takeoff singleton world or not. I think there is some evidence in that direction, e.g. from thinking about the kind of qualitative changes in AI algorithms that might come about in the future, and their implications on the capability growth curve, and also about the possibility of recursive self-improvement. But, the evidence is definitely far from conclusive (in any direction).
I think that the singleton world is definitely likely enough to merit some consideration. I also think that some of the same principles apply to some multipole worlds.
Yes, I never imagined doing such a thing, but I definitely agree it should be made clear. Basically, don’t make threats, i.e. don’t try to shape others incentives in ways that they would be better off precommitting not to go along with it.