I think you’re right that it would be difficult to think we could outsmart something that is smarter than us and can also think faster than us.
That said, I think there is a different solution available to the one we’re trying to define here.
If the antidote to unfriendly AI is friendly AI but we are struggling to define what friendly AI is then perhaps there is
a logic disconnect?
If unfriendly AI is something that will harm humanity then how can the opposite of that be something that benefits humanity?
Clearly the logical answer is that friendly AI is all incarnations of AI that will not harm humanity if unfriendly AI is all incarnations of AI that will harm humanity.
i.e. simply stated friendly AI = NOT(unfriendly AI).
To me that seems much more logically intuitive than trying to define something that will “benefit humanity” when clearly defining goals that will benefit humanity are contradictory. It’s far, far easier to define what unfriendly AI will do and then define not doing that as being friendly.
I think we have a case of feature creep here and instead of trying to define friendly AI as the antidote to unfriendly AI we’ve also tagged on the nice to have features of also making it “helpful AI”.
An AI can be seen as just a very powerful optimization process. Creating one is likely to result in our local reason of the universe being strongly optimal according to that process’s utility function. Currently our local reason of the universe is not strongly optimal according to any utility function, or at any rate not any utility function that takes much less time to describe than our local region of the universe itself. Therefore creation of AGI will almost certainly result in drastic changes to pretty much everything.
If the AGI’s utility function very closely matches our own, this will be very good, if it deviates from our own even in quite small ways, this is overwhelmingly likely to be very bad. There is almost no middle ground, so while helpful AI may not be exactly logically equivalent to friendly AI, in practice they are pretty much the same.
I don’t know if it was your wording, but parts of that went way over my head.
Can I paraphrase by breaking it down to try to understand what your’re saying? Please correct me if I misinterpret.
Firstly: “An AI can be seen as just a very powerful optimization process.”
So whatever process you apply the AI to, it will make it far more efficient and faster would be examples?
And thus just by compound interest, any small deviations from our utility function would rapidly compound till they ran to 100% thus creating problems (or extinction) for us? Doesn’t that mean however, that the AI isn’t really that intelligent as it seems (perhaps naively) that any creature that seeks to tile the universe with paperclips or smiley faces isn’t very intelligent at all. Are we therefore equating an AI to a superfast factory?
Secondly: “Our local reason of the universe being strongly optimal according to that process’s utility function.”
With the word “reason” in there I don’t understand this sentence.
Are you saying “the processes we use to utilize resources will become strongly optimal”? If not, can you break it down a little further? I’m particular struggling with this phrase “Our local reason of the universe”.
I pretty much get the rest.
One extrapolation, however, is that we ourselves aren’t very optimal. If we speed up and e.g. make our resource extraction processes more efficient without putting into place recycling we will much more rapidly run out of resources with a process optimizer to help us do it.
Doesn’t that mean however, that the AI isn’t really that intelligent as it seems (perhaps naively) that any creature that seeks to tile the universe with paperclips or smiley faces isn’t very intelligent at all. Are we therefore equating an AI to a superfast factory?
Yes, basically (if I guess at the parsing of your first sentence correctly), because it doesn’t matter what we’re used to calling “intelligent”, what matters are the conditions for the universe getting tiled with worthless stuff. See this post from Luke’s new introductory sequence.
So whatever process you apply the AI to, it will make it far more efficient and faster would be examples?
That’s not quite what optimization process means. The sequences contain a better explanation than I could give, but roughly speaking it is anything whose behaviour is better understood through thinking about a ‘goal’ that it will try to achieve rather than a simple set of short term laws it will follow. It is a category that includes but is not limited to intelligences, evolution would be an example of a non-intelligent optimization process (thus why they are called processes, they do not have to have a concrete physical form). Humans, most animals and chess-playing computer programs are also optimization processes. Rocks are not.
When an optimization process acts on something that thing ends up ‘optimized’, which means it is in a configuration which satisfies the optimizers utility function much more closely than could ever be expected by random chance. A car, for example, is highly optimized, it is far better at taking humans where they want to go than anything you’d expect to get simply by choosing random configurations of various metal alloys.
Roughly speaking, we can compare the strengths of different optimization processes by how well they are able to steer the state of the world into their goal region, so we might say that Deep Blue beat Kasparov because it was a stronger optimization process (of course, this only holds within the domain of chess, Kasparov could have won easily by taking a sledgehammer to the processor and there would have been nothing Deep Blue could do about it), as well as by how fast they work (evolution is more powerful than animal intelligence, but needed to create animal intelligence anyway since it is too slow), and how wide the domain in which they can function successfully (count the number of animal species that went extinct because humans threw a problem they weren’t used to at them, we ourselves are hopefully a bit more versatile).
When we create an AI we create an optimization process that works very fast, in a very wide domain, and is vastly more powerful than anything that existed before. Thus we are likely to end up far more strongly optimized than ever before.
Doesn’t that mean however, that the AI isn’t really that intelligent as it seems (perhaps naively) that any creature that seeks to tile the universe with paperclips or smiley faces isn’t very intelligent at all.
No, it doesn’t mean that. There is no way to derive the proposition “tiling the universe with paper-clips is a pointless thing to do” from pure logic and empirical observation. You and I only believe it because we are humans, and cannot see how such a thing would even slightly satisfy our human goals. If another being has “tile the universe with paper-clips” as its goal that fact alone is no reflection on its intelligence at all. If it then turned our to be extremely good at tiling the universe with paper-clips (a task which will likely include outsmarting humanity) then I would be happy to call it super-intelligent.
Secondly: “Our local reason of the universe being strongly optimal according to that process’s utility function.”
That was a typo, I meant to write ‘our local region of the universe’. Hope that’s clearer now.
I think you’re right that it would be difficult to think we could outsmart something that is smarter than us and can also think faster than us.
That said, I think there is a different solution available to the one we’re trying to define here.
If the antidote to unfriendly AI is friendly AI but we are struggling to define what friendly AI is then perhaps there is a logic disconnect?
If unfriendly AI is something that will harm humanity then how can the opposite of that be something that benefits humanity?
Clearly the logical answer is that friendly AI is all incarnations of AI that will not harm humanity if unfriendly AI is all incarnations of AI that will harm humanity.
i.e. simply stated friendly AI = NOT(unfriendly AI).
To me that seems much more logically intuitive than trying to define something that will “benefit humanity” when clearly defining goals that will benefit humanity are contradictory. It’s far, far easier to define what unfriendly AI will do and then define not doing that as being friendly.
I think we have a case of feature creep here and instead of trying to define friendly AI as the antidote to unfriendly AI we’ve also tagged on the nice to have features of also making it “helpful AI”.
I think helpful AI != friendly AI.
Roughly speaking the idea is as follows.
An AI can be seen as just a very powerful optimization process. Creating one is likely to result in our local reason of the universe being strongly optimal according to that process’s utility function. Currently our local reason of the universe is not strongly optimal according to any utility function, or at any rate not any utility function that takes much less time to describe than our local region of the universe itself. Therefore creation of AGI will almost certainly result in drastic changes to pretty much everything.
If the AGI’s utility function very closely matches our own, this will be very good, if it deviates from our own even in quite small ways, this is overwhelmingly likely to be very bad. There is almost no middle ground, so while helpful AI may not be exactly logically equivalent to friendly AI, in practice they are pretty much the same.
Thank you for the reply.
I don’t know if it was your wording, but parts of that went way over my head.
Can I paraphrase by breaking it down to try to understand what your’re saying? Please correct me if I misinterpret.
Firstly: “An AI can be seen as just a very powerful optimization process.”
So whatever process you apply the AI to, it will make it far more efficient and faster would be examples? And thus just by compound interest, any small deviations from our utility function would rapidly compound till they ran to 100% thus creating problems (or extinction) for us? Doesn’t that mean however, that the AI isn’t really that intelligent as it seems (perhaps naively) that any creature that seeks to tile the universe with paperclips or smiley faces isn’t very intelligent at all. Are we therefore equating an AI to a superfast factory?
Secondly: “Our local reason of the universe being strongly optimal according to that process’s utility function.”
With the word “reason” in there I don’t understand this sentence. Are you saying “the processes we use to utilize resources will become strongly optimal”? If not, can you break it down a little further? I’m particular struggling with this phrase “Our local reason of the universe”.
I pretty much get the rest.
One extrapolation, however, is that we ourselves aren’t very optimal. If we speed up and e.g. make our resource extraction processes more efficient without putting into place recycling we will much more rapidly run out of resources with a process optimizer to help us do it.
Yes, basically (if I guess at the parsing of your first sentence correctly), because it doesn’t matter what we’re used to calling “intelligent”, what matters are the conditions for the universe getting tiled with worthless stuff. See this post from Luke’s new introductory sequence.
That’s not quite what optimization process means. The sequences contain a better explanation than I could give, but roughly speaking it is anything whose behaviour is better understood through thinking about a ‘goal’ that it will try to achieve rather than a simple set of short term laws it will follow. It is a category that includes but is not limited to intelligences, evolution would be an example of a non-intelligent optimization process (thus why they are called processes, they do not have to have a concrete physical form). Humans, most animals and chess-playing computer programs are also optimization processes. Rocks are not.
When an optimization process acts on something that thing ends up ‘optimized’, which means it is in a configuration which satisfies the optimizers utility function much more closely than could ever be expected by random chance. A car, for example, is highly optimized, it is far better at taking humans where they want to go than anything you’d expect to get simply by choosing random configurations of various metal alloys.
Roughly speaking, we can compare the strengths of different optimization processes by how well they are able to steer the state of the world into their goal region, so we might say that Deep Blue beat Kasparov because it was a stronger optimization process (of course, this only holds within the domain of chess, Kasparov could have won easily by taking a sledgehammer to the processor and there would have been nothing Deep Blue could do about it), as well as by how fast they work (evolution is more powerful than animal intelligence, but needed to create animal intelligence anyway since it is too slow), and how wide the domain in which they can function successfully (count the number of animal species that went extinct because humans threw a problem they weren’t used to at them, we ourselves are hopefully a bit more versatile).
When we create an AI we create an optimization process that works very fast, in a very wide domain, and is vastly more powerful than anything that existed before. Thus we are likely to end up far more strongly optimized than ever before.
No, it doesn’t mean that. There is no way to derive the proposition “tiling the universe with paper-clips is a pointless thing to do” from pure logic and empirical observation. You and I only believe it because we are humans, and cannot see how such a thing would even slightly satisfy our human goals. If another being has “tile the universe with paper-clips” as its goal that fact alone is no reflection on its intelligence at all. If it then turned our to be extremely good at tiling the universe with paper-clips (a task which will likely include outsmarting humanity) then I would be happy to call it super-intelligent.
That was a typo, I meant to write ‘our local region of the universe’. Hope that’s clearer now.