“instant sociopath, just add a disutility function”
I’m arguing against the notion that the key to Friendly AI is crafting the perfect utility function.
I agree with this. The key is not expressing what we want, it’s figuring out how to express anything.
By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.
If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of “shut down when we push that button, and don’t stop us from doing so...”).
“instant sociopath, just add a disutility function”
That is how it would turn out, yes :-)
If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of “shut down when we push that button, and don’t stop us from doing so...”).
Well, up to a point. It would mean we have the means to make the system understand simple requirements, not necessarily complex ones. If an AGI reliably understands ‘shut down now’, it probably also reliably understands ‘translate this document into Russian’ but that doesn’t necessarily mean it can do anything with ‘bring about world peace’.
If an AGI reliably understands ‘shut down now’, it probably also reliably understands ‘translate this document into Russian’ but that doesn’t necessarily mean it can do anything with ‘bring about world peace’.
Unfortunately, it can, and that is one of the reasons we have to be careful. I don’t want the entire population of the planet to be forcibly sedated.
I don’t want the entire population of the planet to be forcibly sedated.
Leaving aside other reasons why that scenario is unrealistic, it does indeed illustrate why part of building a system that can reliably figure out what you mean by simple instructions, is making sure that when it’s out of its depth, it stops with an error message or request for clarification instead of guessing.
“instant sociopath, just add a disutility function”
I agree with this. The key is not expressing what we want, it’s figuring out how to express anything.
If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of “shut down when we push that button, and don’t stop us from doing so...”).
That is how it would turn out, yes :-)
Well, up to a point. It would mean we have the means to make the system understand simple requirements, not necessarily complex ones. If an AGI reliably understands ‘shut down now’, it probably also reliably understands ‘translate this document into Russian’ but that doesn’t necessarily mean it can do anything with ‘bring about world peace’.
Unfortunately, it can, and that is one of the reasons we have to be careful. I don’t want the entire population of the planet to be forcibly sedated.
Leaving aside other reasons why that scenario is unrealistic, it does indeed illustrate why part of building a system that can reliably figure out what you mean by simple instructions, is making sure that when it’s out of its depth, it stops with an error message or request for clarification instead of guessing.
I think the problem is knowing when not to believe humans know what they actually want.