I’m glad it was discussed in the book because I’d never come across it before. So far though I find it one of the least convincing parts of the book, although I am skeptical that I am appropriately evaluating it. Would anyone be able to clarify some things for me?
How generally accepted is the orthogonality thesis? Bostrom presents it as very well accepted.
Danaher’s Motivating Belief Objection is similar to an objection I had while reading about the orthogonality thesis. Mine was not as strict though. It just seemed to me that as intelligence increases new beliefs about what should be done are likely to be discovered. I don’t see that these beliefs need to be “true beliefs” although as intelligence increases I guess they approach true. I also don’t see that they need to be “necessarily motivating”, but rather they should have some non-zero probability of being motivating. I mean, to disprove the orthogonality thesis we just have to say that as intelligence increases there’s a chance that final goals change right?
The main point of the orthogonality thesis is that we can’t rely on intelligence to produce the morality we want. So saying that there’s a 50% chance of the thesis being correct ought to cause us to act much like we would act if it were proven, whereas certainty that it is false would imply something very different.
It just seemed to me that as intelligence increases new beliefs about what should be done are likely to be discovered
It seems that way because we are human and we don’t have a clearly defined consistent goal structure. As you find out new things you can flesh out your goal structure more and more.
If one starts with a well-defined goal structure, what knowledge might alter it?
Because an AI with a non-well-defined goal structure that changes it minds and turns into a paperclipper is just about as bad as building a paperclipper directly. It’s not obvious to me that non-well-defined non-paperclippers are easier to make than well-defined non-paperclippers.
Paperclippers aren’t dangerous unless they are fairly stable paperclippers...and something as arbitrary as papercliping is a very poor candidate for an attractor. The good candidates are the goals Omuhudro thinks AIs will converge on.
I’m glad it was discussed in the book because I’d never come across it before. So far though I find it one of the least convincing parts of the book, although I am skeptical that I am appropriately evaluating it. Would anyone be able to clarify some things for me?
How generally accepted is the orthogonality thesis? Bostrom presents it as very well accepted.
Danaher’s Motivating Belief Objection is similar to an objection I had while reading about the orthogonality thesis. Mine was not as strict though. It just seemed to me that as intelligence increases new beliefs about what should be done are likely to be discovered. I don’t see that these beliefs need to be “true beliefs” although as intelligence increases I guess they approach true. I also don’t see that they need to be “necessarily motivating”, but rather they should have some non-zero probability of being motivating. I mean, to disprove the orthogonality thesis we just have to say that as intelligence increases there’s a chance that final goals change right?
The main point of the orthogonality thesis is that we can’t rely on intelligence to produce the morality we want. So saying that there’s a 50% chance of the thesis being correct ought to cause us to act much like we would act if it were proven, whereas certainty that it is false would imply something very different.
It seems that way because we are human and we don’t have a clearly defined consistent goal structure. As you find out new things you can flesh out your goal structure more and more.
If one starts with a well-defined goal structure, what knowledge might alter it?
If starting with a well defined goal structure is a necessary prerequisite for a paperclippers, why do that?
Because an AI with a non-well-defined goal structure that changes it minds and turns into a paperclipper is just about as bad as building a paperclipper directly. It’s not obvious to me that non-well-defined non-paperclippers are easier to make than well-defined non-paperclippers.
Paperclippers aren’t dangerous unless they are fairly stable paperclippers...and something as arbitrary as papercliping is a very poor candidate for an attractor. The good candidates are the goals Omuhudro thinks AIs will converge on.
Why do you think so?
Which bit, there’s about three claim there.
The second and third.
I’ve added a longer treatment.
http://lesswrong.com/lw/l4g/superintelligence_9_the_orthogonality_of/blsc