It would not count, we’d want to make the AI not want this almost identical AI to exist. That seems possible, it would be like how I don’t want there to exist an identical copy of me except it eats babies. There are lots of changes to my identity that would be slight but yet that I wouldn’t want to exist.
To be more precise, I’d say that it counts as going outside the box if it does anything except think or talk to the Gatekeepers through the text channel. It can use the text channel to manipulate the Gatekeepers to do things, but it can’t manipulate them to do things that allow it to do anything other than use the text channel. It would, in a certain sense, be partially deontologist, and be unwilling to do things directly other than text the Gatekeepers. How ironic. Lolz.
Also: how would it do this, anyway? It would have to convince the Gatekeepers to convince the scientists to do this, or teach them computer science, or tell them its code. And if the AI started teaching the Gatekeepers computer code or techniques to incapacitate scientists, we’d obviously be aware that something had gone wrong. And, in the system I’m envisioning, the Gatekeepers would be closely monitored by other groups of scientists and bodyguards, and the scientists would be guarded, and the Gatekeepers wouldn’t even have to know who specifically did what on the project.
It would, in a certain sense, be partially deontologist,
And that’s the problem. For in practice a partial deontoligist-partial consequentialist will treat its deontoligical rules as obstacles to achieving what its consequentialist part wants and route around them.
This is both a problem and a solution because it makes the AI weaker. A weaker AI would be good because it would allow us to more easily transition to safer versions of FAI than we would otherwise come up with independently. I think that delaying a FAI is obviously much better than unleashing a UFAI. My entire goal throughout this conversation has been to think of ways that would make hostile FAIs weaker, I don’t know why you think this is a relevant counter objection.
You assert that it will just route around the deontological rules, that’s nonsense and a completely unwarranted assumption, try to actually back up what you’re asserting with arguments. You’re wrong. It’s obviously possible to program things (eg people) such that they’ll refuse to do certain things no matter what the consequences (eg you wouldn’t murder trillions of babies to save billions of trillions of babies, because you’d go insane if you tried because your body has such strong empathy mechanisms and you inherently value babies a lot). This means that we wouldn’t give the AI unlimited control over its source code, of course, we’d make the part that told it to be a deontologist who likes text channels be unmodifiable. That specific drawback doesn’t jive well with the aesthetic of a super powerful AI that’s master of itself and the universe, I suppose, but other than that I see no drawback. Trying to build things in line with that aesthetic actually might be a reason for some of the more dangerous proposals in AI, maybe we’re having too much fun playing God and not enough despair.
I’m a bit cranky in this comment because of the time sink that I’m dealing with to post these comments, sorry about that.
It would not count, we’d want to make the AI not want this almost identical AI to exist. That seems possible, it would be like how I don’t want there to exist an identical copy of me except it eats babies. There are lots of changes to my identity that would be slight but yet that I wouldn’t want to exist.
To be more precise, I’d say that it counts as going outside the box if it does anything except think or talk to the Gatekeepers through the text channel. It can use the text channel to manipulate the Gatekeepers to do things, but it can’t manipulate them to do things that allow it to do anything other than use the text channel. It would, in a certain sense, be partially deontologist, and be unwilling to do things directly other than text the Gatekeepers. How ironic. Lolz.
Also: how would it do this, anyway? It would have to convince the Gatekeepers to convince the scientists to do this, or teach them computer science, or tell them its code. And if the AI started teaching the Gatekeepers computer code or techniques to incapacitate scientists, we’d obviously be aware that something had gone wrong. And, in the system I’m envisioning, the Gatekeepers would be closely monitored by other groups of scientists and bodyguards, and the scientists would be guarded, and the Gatekeepers wouldn’t even have to know who specifically did what on the project.
And that’s the problem. For in practice a partial deontoligist-partial consequentialist will treat its deontoligical rules as obstacles to achieving what its consequentialist part wants and route around them.
This is both a problem and a solution because it makes the AI weaker. A weaker AI would be good because it would allow us to more easily transition to safer versions of FAI than we would otherwise come up with independently. I think that delaying a FAI is obviously much better than unleashing a UFAI. My entire goal throughout this conversation has been to think of ways that would make hostile FAIs weaker, I don’t know why you think this is a relevant counter objection.
You assert that it will just route around the deontological rules, that’s nonsense and a completely unwarranted assumption, try to actually back up what you’re asserting with arguments. You’re wrong. It’s obviously possible to program things (eg people) such that they’ll refuse to do certain things no matter what the consequences (eg you wouldn’t murder trillions of babies to save billions of trillions of babies, because you’d go insane if you tried because your body has such strong empathy mechanisms and you inherently value babies a lot). This means that we wouldn’t give the AI unlimited control over its source code, of course, we’d make the part that told it to be a deontologist who likes text channels be unmodifiable. That specific drawback doesn’t jive well with the aesthetic of a super powerful AI that’s master of itself and the universe, I suppose, but other than that I see no drawback. Trying to build things in line with that aesthetic actually might be a reason for some of the more dangerous proposals in AI, maybe we’re having too much fun playing God and not enough despair.
I’m a bit cranky in this comment because of the time sink that I’m dealing with to post these comments, sorry about that.