Well, he demonstrated that it can sometimes get out. But my claim was that “getting out” isn’t the scary part- the scary part is “reshaping the world.” My brain can reshape the world just fine while remaining in my skull and only communicating with my body through slow chemical wires, and so giving me the goal of “keep your brain in your skull” doesn’t materially reduce my ability or desire to reshape the world.
EY’s experiment is wholly irrelevant to this claim. Either you’re introducing irrelevant facts or morphing your position. I think you’re doing this without realizing it, and I think it’s probably due to motivated cognition (because morphing claims without noticing it correlates highly with motivated cognition in my experience). I really feel like we might have imposed a box-taboo on this site that is far too strong.
And so if you say “well, we’ll make the AI not want to reshape the world,” then the AI will be silent. If you say “we’ll make the AI not want to reshape the world without the consent of the gatekeepers,” then the gatekeepers might be tricked or make mistakes. If you say “we’ll make the AI not want to reshape the world without the informed consent of the gatekeepers / in ways which disagree with the values of the gatekeepers,” then you’re just saying we should build a Friendly AI, which I agree with!
You keep misunderstanding what I’m saying over and over and over again and it’s really frustrating and a big time sink. I’m going to need to end this conversation if it keeps happening because the utility of it is going down dramatically with each repetition.
I’m not proposing a system where the AI doesn’t interact with the outside world. I’m proposing a system where the AI is only ever willing to use a few appendages to effect the outside world, as opposed to potentially dozens. This minimizes the degree of control that the AI has dramatically, which is a good thing.
This is not FAI either, it is an additional constraint that we should use when putting early FAIs into action. I’m not saying that we merge the AIs values to the values of the gatekeeper, I have no idea where you keep pulling that idea from.
It’s possible that I’m misunderstanding you, but I don’t know how that would be true specifically, because many of your objections just seem totally irrelevant to me and I can’t understand what you’re getting at. It seems more likely that you’re just not used to the idea of this version of boxing so you just regurgitate generic arguments against boxing, or something. You’re also coming up with more obscure arguments as we go farther into this conversation. I don’t really know what’s going on at your end, but I’m just annoyed at this point.
It’s easy to write a safe AI that can only answer one question. How do you get from point A to point B using the road system? Ask Google Maps, and besides some joke answers, you’ll get what you want. When people talk about AGI, though, they mean an AI that can write those safe AIs. If you ask it how to get from point A to point B using the road system, and it doesn’t know that Google Maps exists, it’ll invent a new Google Maps and then use it to answer that question. And so when we ask it to cure cancer, it’ll invent medicine-related AIs until it gets back a satisfactory answer. The trouble is that the combination of individually safe AIs is not a safe AI. If we have a driverless car that works fine with human-checked directions, and direction-generating software that works fine for human drivers, plugging them together might result in a car trying to swim across the Atlantic Ocean. (Google has disabled the swimming answers, so Google Maps no longer provides them.) The more general point is that software is very bad at doing sanity checks that humans don’t realize are hard, and if you write software that can do those sanity checks, it has to be a full AGI.
I don’t even understand how this clashes with my position. I understand that smashing simple AIs together is a dumb idea, but I never proposed that ever. I’m proposing using this special system for early FAIs, and asking them very carefully some very specific questions, along with other questions, so that we can be safe. I don’t want this AI to have any direct power, or even super accurate input information.
Yes, obviously, this type of AI is a more limited AI. That’s the goal. Limiting our first attempt at FAI is a fantastic idea because existential risk is scary. We’ll get less benefits from the FAI, and it will take longer to get those benefits. But it will be a good idea, because it seems really likely to me that we could mess up FAI without even knowing it.
A truism in software is that code is harder to read than write, and often the interesting AIs are the nth generation AIs- where you build an AI that builds an AI that builds an AI (and so on), and turns out that an AI thought all of the human-readability constraints were cruft (because the AI does really run faster and better without those restrictions).
Sure, it will be hard to read the AIs mind. I see no reason why we should just not even try though.
You say that the AI will build an AI that will build an AI. But then you immediately jump to assuming that this means the final AI would leap beyond human comprehension. AIs are not Gods, and we shouldn’t treat them like ones. If we could pause the AI and read its coding, while slowing down its thought processes, and devoting lots of resources to the project (as we should do, no matter what) then reading its mind seems doable. We could also use earlier AIs to help us interpret the thoughts of later AIs, if necessary.
Reading its mind literally would guarantee that it couldn’t trick us. Why would we not choose to pursue this, even if it sorta seems like it might be expensive?
I’ve already addressed this, the AI would still be entirely dependent on its appendages and that’s a major advantage. So long as we watch the appendages and act to check any actions by them that seem suspicious then the AI would remain weak. The AI isn’t magic, and it’s not even beyond the scope of human cunning if we limit its input data. Again, also keep in mind also that we’d watch the communications between the appendages and the AI as well, so we’d know immediately if it was trying to get them to make it any tools. The Gatekeepers wouldn’t exist in a vacuum, they would be watched over and countered by us.
EY’s experiment is wholly irrelevant to this claim. Either you’re introducing irrelevant facts or morphing your position. I think you’re doing this without realizing it, and I think it’s probably due to motivated cognition (because morphing claims without noticing it correlates highly with motivated cognition in my experience). I really feel like we might have imposed a box-taboo on this site that is far too strong.
You keep misunderstanding what I’m saying over and over and over again and it’s really frustrating and a big time sink. I’m going to need to end this conversation if it keeps happening because the utility of it is going down dramatically with each repetition.
I’m not proposing a system where the AI doesn’t interact with the outside world. I’m proposing a system where the AI is only ever willing to use a few appendages to effect the outside world, as opposed to potentially dozens. This minimizes the degree of control that the AI has dramatically, which is a good thing.
This is not FAI either, it is an additional constraint that we should use when putting early FAIs into action. I’m not saying that we merge the AIs values to the values of the gatekeeper, I have no idea where you keep pulling that idea from.
It’s possible that I’m misunderstanding you, but I don’t know how that would be true specifically, because many of your objections just seem totally irrelevant to me and I can’t understand what you’re getting at. It seems more likely that you’re just not used to the idea of this version of boxing so you just regurgitate generic arguments against boxing, or something. You’re also coming up with more obscure arguments as we go farther into this conversation. I don’t really know what’s going on at your end, but I’m just annoyed at this point.
I don’t even understand how this clashes with my position. I understand that smashing simple AIs together is a dumb idea, but I never proposed that ever. I’m proposing using this special system for early FAIs, and asking them very carefully some very specific questions, along with other questions, so that we can be safe. I don’t want this AI to have any direct power, or even super accurate input information.
Yes, obviously, this type of AI is a more limited AI. That’s the goal. Limiting our first attempt at FAI is a fantastic idea because existential risk is scary. We’ll get less benefits from the FAI, and it will take longer to get those benefits. But it will be a good idea, because it seems really likely to me that we could mess up FAI without even knowing it.
Sure, it will be hard to read the AIs mind. I see no reason why we should just not even try though.
You say that the AI will build an AI that will build an AI. But then you immediately jump to assuming that this means the final AI would leap beyond human comprehension. AIs are not Gods, and we shouldn’t treat them like ones. If we could pause the AI and read its coding, while slowing down its thought processes, and devoting lots of resources to the project (as we should do, no matter what) then reading its mind seems doable. We could also use earlier AIs to help us interpret the thoughts of later AIs, if necessary.
Reading its mind literally would guarantee that it couldn’t trick us. Why would we not choose to pursue this, even if it sorta seems like it might be expensive?
The problem is that the AI could use its appendages to create and use tools that are more powerful than the appendages themselves.
I’ve already addressed this, the AI would still be entirely dependent on its appendages and that’s a major advantage. So long as we watch the appendages and act to check any actions by them that seem suspicious then the AI would remain weak. The AI isn’t magic, and it’s not even beyond the scope of human cunning if we limit its input data. Again, also keep in mind also that we’d watch the communications between the appendages and the AI as well, so we’d know immediately if it was trying to get them to make it any tools. The Gatekeepers wouldn’t exist in a vacuum, they would be watched over and countered by us.
I think this conversation has run its course as well, though I intend to pursue a side issue in PMs.