It would be helpful if you could explain how you think the current implementation would fail, since I don’t find the failure obvious. The “safety device” in question is mainly the utility function which should make it so that the AI doesn’t want to optimize everything, and in my view it is sufficient (for this limited scenario—the big challenge, I think, is in scaling it up and making it remotely implementable).
It’s an incredibly complicated software and hardware system. Maybe the most complex thing ever invented. There are bugs. There are unknown unknowns. There are people trying to steal the AI technology. There are people who already have AI technology who are going forward without hindering themselves with these restrictions. All of this doesn’t happen in a bubble; there’s context. This setup has to come from somewhere. The AI has to come from somewhere. This has to be implemented in an actual place. Somebody has to fund all of this—without using that power to compromise the system. Obviously there are loads of surrounding social and technical problems, even assuming this system is built absolutely perfectly, that cannot be swept under the rug when the stakes are this high. I don’t really want to go into specifics beyond that. This is a great attempt at one piece of the puzzle, but it’s only one piece. We will need a lot more.
At any rate, my point is that this is like anything else that generally falls under Hank’s law of AGI safety, which states something like, “Your AGI safety measures will not be good enough, even when taking into account Hank’s law of AGI safety...”
It would be helpful if you could explain how you think the current implementation would fail, since I don’t find the failure obvious. The “safety device” in question is mainly the utility function which should make it so that the AI doesn’t want to optimize everything, and in my view it is sufficient (for this limited scenario—the big challenge, I think, is in scaling it up and making it remotely implementable).
It’s an incredibly complicated software and hardware system. Maybe the most complex thing ever invented. There are bugs. There are unknown unknowns. There are people trying to steal the AI technology. There are people who already have AI technology who are going forward without hindering themselves with these restrictions. All of this doesn’t happen in a bubble; there’s context. This setup has to come from somewhere. The AI has to come from somewhere. This has to be implemented in an actual place. Somebody has to fund all of this—without using that power to compromise the system. Obviously there are loads of surrounding social and technical problems, even assuming this system is built absolutely perfectly, that cannot be swept under the rug when the stakes are this high. I don’t really want to go into specifics beyond that. This is a great attempt at one piece of the puzzle, but it’s only one piece. We will need a lot more.
At any rate, my point is that this is like anything else that generally falls under Hank’s law of AGI safety, which states something like, “Your AGI safety measures will not be good enough, even when taking into account Hank’s law of AGI safety...”
I agree.