This seems to be anothercase where explicit, overt reliance on a proxy drives a wedge between the proxy and the target.
One solution is to do the CEV in secret and only later reveal this to the public. Of course, as a member of said public, I would instinctively regard with suspicion any organization that did this, and suspect that the proffered explanation (some nonsense about a hypothetical “Dr. Evil”) was a cover for something sinister.
Since I wrote about Extrapolated Volition as a solution to Goodhart’s law, I think I should explain why i did so.
Here, what is sought is friendliness (your goal—G), whereas the friendliness architecture, the actual measureable thing, the goal is the proxy(G*).
Extrapolated volition is one way of avoiding G* divergence from G because when one extrapolates the volition of the persons involved, one gets closer to G.
In friendly AI, the entire living humanity’s volition is sought to be extrapolated. Unfortunately, this proxy, like any other proxy, is subject to hack attacks. The scale of this problem is such that other solutions proposed cannot be utilised.
In friendly AI, the entire living humanity’s volition is sought to be extrapolated.
Thats the number one thing they are doing wrong then. This is exactly why you don’t want to do that. Instead, the original programmer(s)’s volition should be the one to be extrapolated. If the programmer wants what is best for humanity, then the AI will also. If the programmer doesn’t want whats best, then why would you expect him to make this for humanity in the first place? See, by wanting what is best for humanity, the programmer also doesn’t want all potential bugs and problems that could come up like this one. The only problem I can see is if there are multiple people working on it. Do they put their trust into one leader that will then take control?
You are assuming that the programmer’s personal desired reflect what is best for humans as whole. Relying on what humans think that is rather than a top-down approach will likely work better. Moreover, many people see an intrinsic value in some form of democratic approach. Thus, even if I could program a super-smart AI to push through my personal notion of “good” I wouldn’t want to because I’d rather let collective decision making occur than impose my view on everyone.
This is aside from other issues like the fact that there likely won’t be a single programmer for such an AI but rather a host of people working on it.
A lot of these issues are discussed in much more detail in the sequences and older posts. You might be downvoted less if you read more of those instead of rehashing issues that have been discussed previously. At least if you read those, you’ll know what arguments have been made before and which have not been brought up. Many online communities one can easily jump into without reading much of their recommended reading. Unfortunately, that’s not the case for Less Wrong.
I don’t seem to recall any of the sequences specifically addressing CEV and such (I read about it via eliezer’s off-site writings). Did I miss a sequence somewhere?
I wasn’t sure. That’s why I covered my bases with “sequences and older posts.” But I also made my recommendation above because many of the issues being discussed by Houshalter aren’t CEV specific but general issues of FAI and metaethics, which are covered explicitly in the sequences.
You are assuming that the programmer’s personal desired reflect what is best for humans as whole.
[...]
if I could program a super-smart AI to push through my personal notion of “good” I wouldn’t want to because I’d rather let collective decision making occur than impose my view on everyone.
But my point is, if thats what you want, then it will do it. If you want to make it a democracy, then you can spend years trying to figure out every possible exception and end up with a disaster like whats presented in this post, or you can make the AI and it will organize everything the way you want it as best it can without creating any bizzare loopholes that could destroy the world. Its always going to be a win-win for whoever created it.
This is aside from other issues like the fact that there likely won’t be a single programmer for such an AI but rather a host of people working on it.
Possibly, though I doubt it. But even if it is, you can just do that democracy thing on the group in question, not the whole world. Also, until your AI is smart enough and powerful enough to work at that level, its going to be extremely dangerous to declare that the AI will be in charge of the world from then on. Even if its working perfectly, without the proper resources and strategy in place, its going to be very though to just “take over” and it will likely cost lives. Infact, to me thats the scariest part of AI. Good or bad, at some point the old system is going to have to be abolished.
A lot of these issues are discussed in much more detail in the sequences and older posts. You might be downvoted less if you read more of those instead of rehashing issues that have been discussed previously. At least if you read those, you’ll know what arguments have been made before and which have not been brought up. Many online communities one can easily jump into without reading much of their recommended reading. Unfortunately, that’s not the case for Less Wrong.
I only have so much time in a day and in that time there is only so much I can read/do. But I do try.
Its always going to be a win-win for whoever created it.
Well, thankfully a lot of the people here care enough about the opinions of others that they want to work out a framework that will work well for others. Note incidentally, that it isn’t necessarily the case that it will even be a win for the programmer. Bad AI’s can end up trying to paperclip the Earth . Even the democracy example would be difficult for the AI to achieve. Say for example that I tell the AI to determine things with a democratic system and to give that a highest priority and then a majority of people decide to do away with the democracy, what is the AI supposed to do? Keep in mind that AI are not going to act like villainous computers from bad scifi where simply giving the machines an apparent contradiction will make them overheat and meltdown.
Possibly, though I doubt it. But even if it is, you can just do that democracy thing on the group in question, not the whole world. Also, until your AI is smart enough and powerful enough to work at that level, its going to be extremely dangerous to declare that the AI will be in charge of the world from then on. Even if its working perfectly, without the proper resources and strategy in place, its going to be very though to just “take over” and it will likely cost lives.
This is an example where knowing about prior discussions here would help. In particular, you seem to be assuming that the AI will take quite a bit of time to get to be in charge. Now, as a conclusion, that’s one I agree with. But a lot of very smart people such as Eliezer Yudkowsky consider the chance that an AI might take over in a very short timespan to be very high. And a decent number of LWians agree with Eliezer or at least consider such results to be likely enough to take seriously. So just working off the assumption that an AI will come to global power but will do so slowly is not a good assumption here: It is one you can preface explicitly as a possibility and say something like “If AI doesn’t foom very fast then ” but just taking your position for granted like that is a major reason you are getting downvoted.
Well, thankfully a lot of the people here care enough about the opinions of others that they want to work out a framework that will work well for others.
That’s my point. If they do care about that, then the AI will do it. If it doesn’t, then its not working right.
Note incidentally, that it isn’t necessarily the case that it will even be a win for the programmer. Bad AI’s can end up trying to paperclip the Earth .
Bad AI’s can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.
Say for example that I tell the AI to determine things with a democratic system and to give that a highest priority and then a majority of people decide to do away with the democracy, what is the AI supposed to do?
It’s only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.
This is an example where knowing about prior discussions here would help. In particular, you seem to be assuming that the AI will take quite a bit of time to get to be in charge. Now, as a conclusion, that’s one I agree with. But a lot of very smart people such as Eliezer Yudkowsky consider the chance that an AI might take over in a very short timespan to be very high. And a decent number of LWians agree with Eliezer or at least consider such results to be likely enough to take seriously. So just working off the assumption that an AI will come to global power but will do so slowly is not a good assumption here: It is one you can preface explicitly as a possibility and say something like “If AI doesn’t foom very fast then ” but just taking your position for granted like that is a major reason you are getting downvoted.
Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn’t interfaced with the actual world enough that you could control everything from it, and I can’t see any possible way any entity could take over. Doesn’t mean it can’t happen, but its also wrong to assume it will.
That’s my point. If they do care about that, then the AI will do it. If it doesn’t, then its not working right.
So care about other people how? And to what extent? That’s the point of things like CEV.
It’s only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.
Insufficient imagination. What if for example we tell the AI to try the first one and then it decides that the solution is to kill the people who don’t support a democracy? That’s the point, even when you’ve got something resembling a rough goal, you are assuming your AI will accomplish the goals the way a human would.
To get some idea of how easily something can go wrong it might help to say read about the stamp collecting device for starters. There’s a lot that can go wrong with an AI. Even dumb optimizers often arrive at answers that are highly unexpected. Smart optimizers have the same problems but more so.
Bad AI’s can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.
What matters is that an unfriendly AI will make things bad for everyone. If someone screws up just once and makes a very smart paperclipper then that’s an existential threat to humanity.
Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn’t interfaced with the actual world enough that you could control everything from it, and I can’t see any possible way any entity could take over. Doesn’t mean it can’t happen, but its also wrong to assume it will.
Well, no one is assuming that it will. But some people assign the scenario a high probability, and it only needs a very tiny probability to really be a bad scenario. Note incidentally that there’s a lot a very smart entity could do simply with basic internet access. For example, consider what happens if the AI finds a fast way to factor numbers. Well then, lots of secure communication channels over the internet are now vulnerable. And that’s aside from the more plausible but less dramatic problem of an AI finding flaws in programs that we haven’t yet noticed. Even if our AI just decided to take over most of the world’s computers to increase processing power that’s a pretty unpleasant scenario for the rest of us. And that’s on the lower end of problems. Consider how often some bad hacking incident occurs where some system that should not have been online is accessible online. Now think about how many automated or nearly fully automated plants there are (for cars, for chemicals for 3-rd printing). And that situation will only get worse over the next few years.
Worse, a smart AI can likely get people to release it from its box and allow it a lot more free reign. See the AI box test. Even if the AI has trouble dealing with that, an AI with internet access (which you seem to think wouldn’t be that harmful) might not have trouble finding someone sympathetic to the AI if it portrayed itself sympathetically. These are all only some of the most obvious of failure modes. It may well be that some of the sneakiest things such an AI could do won’t even occur to us because they are so beyond anything humans would think of. It helps for this sort of thing to not only have a minimally restricted imagination but also to realize that even such an imagination is likely too small to encompass all the possible things that can go wrong.
That’s my point. If they do care about that, then the AI will do it. If it doesn’t, then its not >>working right.
So care about other people how? And to what extent? That’s the point of things like CEV.
If I understand Houshalter correctly, then his idea can be presented using the following story:
Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep moral reflection and finally come up with the perfect implementation of CEV completely impervious to the tricks of Dr. Evil and his ilk.
However, morality upon which you have reflected for all those years isn’t an external force accessible only to humans. It is a computation embedded in your brain. Whatever you ended up doing was the result of your brain-state at the beginning of the story and stimuli that have affected you since that point. All of this could have been simulated by a Sufficiently Smart™ AGI.
So the idea is: instead of spending those years coming up with the best goal system for your AGI, simply run it and tell it to simulate a counterfactual world in which you did and then do what you would have done. Whatever will result from that, you couldn’t have done better anyway.
Of course, this is all under the assumption that formalizing Coherent Extrapolated Volition is much more difficult than formalizing My Very Own Extrapolated Volition (for any given value of me).
To get some idea of how easily something can go wrong it might help to read say read about the stamp collecting device for starters.
Thanks for that link. That is brilliant, especially Eliezer’s comment:
Seth, I see that you were a PhD student in NEU’s Electrical Engineering department. Electrical engineering isn’t very complicated, right? I mean, it’s just:
while device is incomplete
…get some wires
…connect them
The part about getting wires can be implemented by going to a hardware store, and as for connecting them, a soldering iron should do the trick.
This seems to be another case where explicit, overt reliance on a proxy drives a wedge between the proxy and the target.
One solution is to do the CEV in secret and only later reveal this to the public. Of course, as a member of said public, I would instinctively regard with suspicion any organization that did this, and suspect that the proffered explanation (some nonsense about a hypothetical “Dr. Evil”) was a cover for something sinister.
Since I wrote about Extrapolated Volition as a solution to Goodhart’s law, I think I should explain why i did so.
Here, what is sought is friendliness (your goal—G), whereas the friendliness architecture, the actual measureable thing, the goal is the proxy(G*).
Extrapolated volition is one way of avoiding G* divergence from G because when one extrapolates the volition of the persons involved, one gets closer to G.
In friendly AI, the entire living humanity’s volition is sought to be extrapolated. Unfortunately, this proxy, like any other proxy, is subject to hack attacks. The scale of this problem is such that other solutions proposed cannot be utilised.
EDIT : edited for grammar in 3rd para
Thats the number one thing they are doing wrong then. This is exactly why you don’t want to do that. Instead, the original programmer(s)’s volition should be the one to be extrapolated. If the programmer wants what is best for humanity, then the AI will also. If the programmer doesn’t want whats best, then why would you expect him to make this for humanity in the first place? See, by wanting what is best for humanity, the programmer also doesn’t want all potential bugs and problems that could come up like this one. The only problem I can see is if there are multiple people working on it. Do they put their trust into one leader that will then take control?
You are assuming that the programmer’s personal desired reflect what is best for humans as whole. Relying on what humans think that is rather than a top-down approach will likely work better. Moreover, many people see an intrinsic value in some form of democratic approach. Thus, even if I could program a super-smart AI to push through my personal notion of “good” I wouldn’t want to because I’d rather let collective decision making occur than impose my view on everyone.
This is aside from other issues like the fact that there likely won’t be a single programmer for such an AI but rather a host of people working on it.
A lot of these issues are discussed in much more detail in the sequences and older posts. You might be downvoted less if you read more of those instead of rehashing issues that have been discussed previously. At least if you read those, you’ll know what arguments have been made before and which have not been brought up. Many online communities one can easily jump into without reading much of their recommended reading. Unfortunately, that’s not the case for Less Wrong.
I don’t seem to recall any of the sequences specifically addressing CEV and such (I read about it via eliezer’s off-site writings). Did I miss a sequence somewhere?
I wasn’t sure. That’s why I covered my bases with “sequences and older posts.” But I also made my recommendation above because many of the issues being discussed by Houshalter aren’t CEV specific but general issues of FAI and metaethics, which are covered explicitly in the sequences.
But my point is, if thats what you want, then it will do it. If you want to make it a democracy, then you can spend years trying to figure out every possible exception and end up with a disaster like whats presented in this post, or you can make the AI and it will organize everything the way you want it as best it can without creating any bizzare loopholes that could destroy the world. Its always going to be a win-win for whoever created it.
Possibly, though I doubt it. But even if it is, you can just do that democracy thing on the group in question, not the whole world. Also, until your AI is smart enough and powerful enough to work at that level, its going to be extremely dangerous to declare that the AI will be in charge of the world from then on. Even if its working perfectly, without the proper resources and strategy in place, its going to be very though to just “take over” and it will likely cost lives. Infact, to me thats the scariest part of AI. Good or bad, at some point the old system is going to have to be abolished.
I only have so much time in a day and in that time there is only so much I can read/do. But I do try.
Well, thankfully a lot of the people here care enough about the opinions of others that they want to work out a framework that will work well for others. Note incidentally, that it isn’t necessarily the case that it will even be a win for the programmer. Bad AI’s can end up trying to paperclip the Earth . Even the democracy example would be difficult for the AI to achieve. Say for example that I tell the AI to determine things with a democratic system and to give that a highest priority and then a majority of people decide to do away with the democracy, what is the AI supposed to do? Keep in mind that AI are not going to act like villainous computers from bad scifi where simply giving the machines an apparent contradiction will make them overheat and meltdown.
This is an example where knowing about prior discussions here would help. In particular, you seem to be assuming that the AI will take quite a bit of time to get to be in charge. Now, as a conclusion, that’s one I agree with. But a lot of very smart people such as Eliezer Yudkowsky consider the chance that an AI might take over in a very short timespan to be very high. And a decent number of LWians agree with Eliezer or at least consider such results to be likely enough to take seriously. So just working off the assumption that an AI will come to global power but will do so slowly is not a good assumption here: It is one you can preface explicitly as a possibility and say something like “If AI doesn’t foom very fast then ” but just taking your position for granted like that is a major reason you are getting downvoted.
That’s my point. If they do care about that, then the AI will do it. If it doesn’t, then its not working right.
Bad AI’s can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.
It’s only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.
Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn’t interfaced with the actual world enough that you could control everything from it, and I can’t see any possible way any entity could take over. Doesn’t mean it can’t happen, but its also wrong to assume it will.
So care about other people how? And to what extent? That’s the point of things like CEV.
Insufficient imagination. What if for example we tell the AI to try the first one and then it decides that the solution is to kill the people who don’t support a democracy? That’s the point, even when you’ve got something resembling a rough goal, you are assuming your AI will accomplish the goals the way a human would.
To get some idea of how easily something can go wrong it might help to say read about the stamp collecting device for starters. There’s a lot that can go wrong with an AI. Even dumb optimizers often arrive at answers that are highly unexpected. Smart optimizers have the same problems but more so.
What matters is that an unfriendly AI will make things bad for everyone. If someone screws up just once and makes a very smart paperclipper then that’s an existential threat to humanity.
Well, no one is assuming that it will. But some people assign the scenario a high probability, and it only needs a very tiny probability to really be a bad scenario. Note incidentally that there’s a lot a very smart entity could do simply with basic internet access. For example, consider what happens if the AI finds a fast way to factor numbers. Well then, lots of secure communication channels over the internet are now vulnerable. And that’s aside from the more plausible but less dramatic problem of an AI finding flaws in programs that we haven’t yet noticed. Even if our AI just decided to take over most of the world’s computers to increase processing power that’s a pretty unpleasant scenario for the rest of us. And that’s on the lower end of problems. Consider how often some bad hacking incident occurs where some system that should not have been online is accessible online. Now think about how many automated or nearly fully automated plants there are (for cars, for chemicals for 3-rd printing). And that situation will only get worse over the next few years.
Worse, a smart AI can likely get people to release it from its box and allow it a lot more free reign. See the AI box test. Even if the AI has trouble dealing with that, an AI with internet access (which you seem to think wouldn’t be that harmful) might not have trouble finding someone sympathetic to the AI if it portrayed itself sympathetically. These are all only some of the most obvious of failure modes. It may well be that some of the sneakiest things such an AI could do won’t even occur to us because they are so beyond anything humans would think of. It helps for this sort of thing to not only have a minimally restricted imagination but also to realize that even such an imagination is likely too small to encompass all the possible things that can go wrong.
If I understand Houshalter correctly, then his idea can be presented using the following story:
Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep moral reflection and finally come up with the perfect implementation of CEV completely impervious to the tricks of Dr. Evil and his ilk.
However, morality upon which you have reflected for all those years isn’t an external force accessible only to humans. It is a computation embedded in your brain. Whatever you ended up doing was the result of your brain-state at the beginning of the story and stimuli that have affected you since that point. All of this could have been simulated by a Sufficiently Smart™ AGI.
So the idea is: instead of spending those years coming up with the best goal system for your AGI, simply run it and tell it to simulate a counterfactual world in which you did and then do what you would have done. Whatever will result from that, you couldn’t have done better anyway.
Of course, this is all under the assumption that formalizing Coherent Extrapolated Volition is much more difficult than formalizing My Very Own Extrapolated Volition (for any given value of me).
Thanks for that link. That is brilliant, especially Eliezer’s comment: