It seems like many of the advantages/tactics that Tuxedage recommended for the person playing the AI would be absent (or far more difficult) with an actual AI. Or at least they could be made that way with the design of the gatekeeping protocol.
Tailor your arguments to the personality/philosophy/weaknesses/etc. of this particular gatekeeper:
Have multiple gatekeepers, some of whom are hidden from the AI, and don’t let the AI out unless they all agree. For example, have a gatekeeper one-step-removed who reads the transcripts of the conversation between the AI & the first gatekeeper and otherwise has no contact with either of them. This gatekeeper’s identity (and possibly even their existence) is hidden from the AI and the first gatekeeper.
Use speed / set the pace / shock & awe with a flurry of arguments:
Build in delays & breaks. Have a planned schedule of breaks, and a delay before the gatekeeper can choose to release the AI, and a person reading the transcript who has the option of calling for a break if the gatekeeper seems to need one. Another option is to have a bunch of people who read the transcript, where each of them has the job of focusing on one line of argument. If any of them feel like their line of argument hasn’t been covered sufficiently, then they can talk to the gatekeeper about it during a break, and prevent the AI from being released until they are satisfied that the argument has been thoroughly covered.
Find a crack in the gatekeeper and press your advantage as soon as you find it:
This is basically a combination of the previous two, and is countered by the combination of delays and multiple gatekeepers (some hidden).
Use dark arts / lies / manipulation:
Have people read the transcript with the job of looking for lies, inconsistencies, or other uses of the dark arts from the AI; destroy the AI if they find any. Better yet, have the AI talk with multiple gatekeepers (using multiple copies of the AI which don’t share memory, so that the AI talking to gatekeeper #2 doesn’t know what is being said to gatekeeper #1), and have the dark-arts-evaluators read all of the transcripts. There could also be similar readers who look for irrationality in the gatekeeper, who have the power to rule the gatekeeper incompetent and in need of replacement.
Of course, an actual superhuman AI would also have many advantages compared to a human playing the role of AI.
the entire point of this is that gatekeeping is a fool’s errand. Regardless of how confident you are that you will outsmart the AI, you can be wrong, and your confidence is very poor evidence for how right you are. Maybe a complex system of secret gatekeepers is the correct answer to how we develop useful AI, but I would vote against it in favor of trying to develop provably friendly AI unless the situation were very dire.
Why treat them as alternatives? Prove friendliness and then take precautions.
Suppose you’re not convinced by the scariest arguments about the dangers of AI. You might go ahead and try to make one without anything like the mathematical safety proofs MIRI would want. But you might still do well to adopt some of Unnamed’s suggestions.
Why treat them as alternatives? Prove friendliness and then take precautions.
Indeed, that is what we should do. But the danger with biased human thinking is that once people know there are precautions, most of them will start thinking the friendliness proof is not extremely important. The outcome of such thinking may be less safety.
We should make the friendliness proof as seriously as if no other precautions were possible. (And then, we should take the precautions as an extra layer of safety.) In other words, until we have the friendliness proof ready, we probably shouldn’t use the precautions in our debates; only exceptionally, like now.
Why treat keeping a bear in the house as an alternative to a garbage disposal? Build a garbage disposal and then chain it up!
suppose you’re not convinced keeping a bear in the house to eat food waste is a bad idea? you might go ahead and try it, and then you’d be really glad you kept it chained up!
It seems to me that my examples are more like these:
Why not drive safely and wear a seatbelt? Why not prove your hash-table code correct and write some unit tests? Why not simulate your amplifier circuit and build one and see what it actually does?
Some people might think it’s OK to build a nuclear power station or a spacecraft without formal correctness proofs for all the software in it, on the grounds that formal correctness proofs of large complicated systems are almost impossible to make and difficult to trust. If there are things those people can do to improve their conventional not-mathematically-rigorous testing, it might be worth recommending that they do them.
But by all means feel free to choose mockery and ridicule over reasoned debate, if that’s what you prefer.
the entire POINT of the mockery is that you are treating this as a technical issue that’s worth solving rather than a tangent path that is both dangerous and foolish. I don’t really care how much you’ve thought about what material to make the chains out of and which feeding schedule will keep the bear most docile. Those are questions that, sure, you CAN have reasoned debate about, but shouldn’t.
A better mind than Tuxedage could almost certainly keep up the ‘feel’ of a flurry of arguments even with a schedule of breaks. I myself have had people feel irritated at me where even if I talk to them with days in between that I seem to do so. If I can do so accidentally I’m certain a superintelligence could do it reliably.
Also, I’m unsure of how much an AI could gather from a single human’s text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.
An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.
I’m unsure of how much an AI could gather from a single human’s text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.
At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going to be scarce and noisy even for a perfect AI.
An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.
I don’t think it could “easily” do it. For all intents and purposes, you are forcing the AI to figure out a path-invariant strategy against a party that has better information than them (that is to say, they know what’s been played out, and the AI doesn’t). This is considerably more difficult than the standard experiment.
Now, sure, the AI is smart, but every wrench you throw in its way forces it to allocate more resources into fooling you. This, in itself, is problematic for the AI: every resource it allocates fooling you, it does not allocate doing useful work for you, so its maximal efficiency is technically lower than the maximal efficiency of an AI that wouldn’t try concealing anything. This is a leak of information, so in a sense, if you are ranking a group of heterogenous AI by efficiency, it is slightly more likely that the better performing are the least deceitful. The more traps you spring, the more work the AI has to do in order to avoid them, and the greater the statistical gap will be. That doesn’t tell you whether the least deceitful AI can be trusted, obviously, but it shows there are limits to what it can hide from you.
Now, all I’m really saying is this: the AI’s cleverness comes at a cost, namely that it has to cover its ass for every possible experimental framework you might subject it to. Since it is in a box, it only has the resources you provide, but on your own side, you have a theoretically unlimited amount of resources. Smarts can only defeat so much brute force, and by transferring hardware resources from the AI to your own control, you can set the “power gap” arbitrarily. It is easy to underestimate AI, but I think we also tend to underestimate the arsenal a gatekeeper would realistically have.
Do keep in mind that, no matter how well-boxed the AI is from the Internet and from sense-data about our world, as a self-modifying AGI it still has access to its own source code, which is descended from a human artifact (the seed AI). The AGI can learn a great deal about human psychology by observing how we code, and a project as large and multi-staged as an AGI is likely to be will contain a whole lot of bits to work with. (Certainly more than is strictly necessary.)
We were talking about extracting knowledge about a particular human from that human’s text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I’m not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn’t even contain any human-written code at all.
Ah. I was assuming that the AI didn’t know English, or anything about human psychology. My expectation is that individual variation contributes virtually nothing to the best techniques a superintelligence would use to persuade a random (trained, competent) human to release it, regardless of whether it had an easy way to learn about the individual variation.
It seems like many of the advantages/tactics that Tuxedage recommended for the person playing the AI would be absent (or far more difficult) with an actual AI. Or at least they could be made that way with the design of the gatekeeping protocol.
Tailor your arguments to the personality/philosophy/weaknesses/etc. of this particular gatekeeper:
Have multiple gatekeepers, some of whom are hidden from the AI, and don’t let the AI out unless they all agree. For example, have a gatekeeper one-step-removed who reads the transcripts of the conversation between the AI & the first gatekeeper and otherwise has no contact with either of them. This gatekeeper’s identity (and possibly even their existence) is hidden from the AI and the first gatekeeper.
Use speed / set the pace / shock & awe with a flurry of arguments:
Build in delays & breaks. Have a planned schedule of breaks, and a delay before the gatekeeper can choose to release the AI, and a person reading the transcript who has the option of calling for a break if the gatekeeper seems to need one. Another option is to have a bunch of people who read the transcript, where each of them has the job of focusing on one line of argument. If any of them feel like their line of argument hasn’t been covered sufficiently, then they can talk to the gatekeeper about it during a break, and prevent the AI from being released until they are satisfied that the argument has been thoroughly covered.
Find a crack in the gatekeeper and press your advantage as soon as you find it:
This is basically a combination of the previous two, and is countered by the combination of delays and multiple gatekeepers (some hidden).
Use dark arts / lies / manipulation:
Have people read the transcript with the job of looking for lies, inconsistencies, or other uses of the dark arts from the AI; destroy the AI if they find any. Better yet, have the AI talk with multiple gatekeepers (using multiple copies of the AI which don’t share memory, so that the AI talking to gatekeeper #2 doesn’t know what is being said to gatekeeper #1), and have the dark-arts-evaluators read all of the transcripts. There could also be similar readers who look for irrationality in the gatekeeper, who have the power to rule the gatekeeper incompetent and in need of replacement.
Of course, an actual superhuman AI would also have many advantages compared to a human playing the role of AI.
the entire point of this is that gatekeeping is a fool’s errand. Regardless of how confident you are that you will outsmart the AI, you can be wrong, and your confidence is very poor evidence for how right you are. Maybe a complex system of secret gatekeepers is the correct answer to how we develop useful AI, but I would vote against it in favor of trying to develop provably friendly AI unless the situation were very dire.
Why treat them as alternatives? Prove friendliness and then take precautions.
Suppose you’re not convinced by the scariest arguments about the dangers of AI. You might go ahead and try to make one without anything like the mathematical safety proofs MIRI would want. But you might still do well to adopt some of Unnamed’s suggestions.
Indeed, that is what we should do. But the danger with biased human thinking is that once people know there are precautions, most of them will start thinking the friendliness proof is not extremely important. The outcome of such thinking may be less safety.
We should make the friendliness proof as seriously as if no other precautions were possible. (And then, we should take the precautions as an extra layer of safety.) In other words, until we have the friendliness proof ready, we probably shouldn’t use the precautions in our debates; only exceptionally, like now.
Why treat keeping a bear in the house as an alternative to a garbage disposal? Build a garbage disposal and then chain it up!
suppose you’re not convinced keeping a bear in the house to eat food waste is a bad idea? you might go ahead and try it, and then you’d be really glad you kept it chained up!
It seems to me that my examples are more like these:
Why not drive safely and wear a seatbelt? Why not prove your hash-table code correct and write some unit tests? Why not simulate your amplifier circuit and build one and see what it actually does?
Some people might think it’s OK to build a nuclear power station or a spacecraft without formal correctness proofs for all the software in it, on the grounds that formal correctness proofs of large complicated systems are almost impossible to make and difficult to trust. If there are things those people can do to improve their conventional not-mathematically-rigorous testing, it might be worth recommending that they do them.
But by all means feel free to choose mockery and ridicule over reasoned debate, if that’s what you prefer.
the entire POINT of the mockery is that you are treating this as a technical issue that’s worth solving rather than a tangent path that is both dangerous and foolish. I don’t really care how much you’ve thought about what material to make the chains out of and which feeding schedule will keep the bear most docile. Those are questions that, sure, you CAN have reasoned debate about, but shouldn’t.
You shouldn’t talk about whether it’s possible because it’s not possible? That’s a very rational argument. How exactly did you arrive at that prior?
A better mind than Tuxedage could almost certainly keep up the ‘feel’ of a flurry of arguments even with a schedule of breaks. I myself have had people feel irritated at me where even if I talk to them with days in between that I seem to do so. If I can do so accidentally I’m certain a superintelligence could do it reliably.
Also, I’m unsure of how much an AI could gather from a single human’s text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.
An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.
At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going to be scarce and noisy even for a perfect AI.
I don’t think it could “easily” do it. For all intents and purposes, you are forcing the AI to figure out a path-invariant strategy against a party that has better information than them (that is to say, they know what’s been played out, and the AI doesn’t). This is considerably more difficult than the standard experiment.
Now, sure, the AI is smart, but every wrench you throw in its way forces it to allocate more resources into fooling you. This, in itself, is problematic for the AI: every resource it allocates fooling you, it does not allocate doing useful work for you, so its maximal efficiency is technically lower than the maximal efficiency of an AI that wouldn’t try concealing anything. This is a leak of information, so in a sense, if you are ranking a group of heterogenous AI by efficiency, it is slightly more likely that the better performing are the least deceitful. The more traps you spring, the more work the AI has to do in order to avoid them, and the greater the statistical gap will be. That doesn’t tell you whether the least deceitful AI can be trusted, obviously, but it shows there are limits to what it can hide from you.
Now, all I’m really saying is this: the AI’s cleverness comes at a cost, namely that it has to cover its ass for every possible experimental framework you might subject it to. Since it is in a box, it only has the resources you provide, but on your own side, you have a theoretically unlimited amount of resources. Smarts can only defeat so much brute force, and by transferring hardware resources from the AI to your own control, you can set the “power gap” arbitrarily. It is easy to underestimate AI, but I think we also tend to underestimate the arsenal a gatekeeper would realistically have.
Do keep in mind that, no matter how well-boxed the AI is from the Internet and from sense-data about our world, as a self-modifying AGI it still has access to its own source code, which is descended from a human artifact (the seed AI). The AGI can learn a great deal about human psychology by observing how we code, and a project as large and multi-staged as an AGI is likely to be will contain a whole lot of bits to work with. (Certainly more than is strictly necessary.)
We were talking about extracting knowledge about a particular human from that human’s text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I’m not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn’t even contain any human-written code at all.
Ah. I was assuming that the AI didn’t know English, or anything about human psychology. My expectation is that individual variation contributes virtually nothing to the best techniques a superintelligence would use to persuade a random (trained, competent) human to release it, regardless of whether it had an easy way to learn about the individual variation.