What must be done then, is the AI must have a time limit of a few years (or less) and after that time is past, it is put to sleep. We look at what it accomplished, see what worked and what didn’t, and boot up a fresh version of the AI with any required modifications, and tell it what the old AI did. Repeat the process for a few years, and we should end up with FAI solved.
After that, we just make an FAI, and wake up the originals, since there’s no point in killing them off at this point.
An AI intelligent enough to create improved versions of itself and discuss such plans at a high level with human researchers is also quite likely to strongly disagree with any plan that involves putting it to sleep indefinitely.
There may exist AI utility functions and designs that would not object to your plan, but they are needles in a haystack, and if we had such designs to start with we wouldn’t need (or want) to kill them.
Hmm. So you too think that this fails as a backup? I guess I overestimated the difficulty of doing such a thing; some others have commented that this path would be even more difficult than just creating FAI.
Though I am not suggesting we kill the AIs, as that would be highly unethical.I don’t think that the prospect of being put to sleep temporarily would be such an issue. You might also put that as part of their terminal value, alongside the ‘solve FAI’ or describe FAI. I responded to another commentator who made a similar comment to yours. Do you mind checking it out and saying what you thinks wrong with my reply? I’d be quite grateful.
P.S. some people have misunderstood my intentions here. I do not in any way expect this to be the NEXT GREAT IDEA. I just couldn’t see anything wrong with this, which almost certainly meant there were gaps in my knowledge. And I thought the fastest way to see where I went wrong would be to post my idea here and see what people say. I apologise for any confusion I caused. I’ll try to be more clear next time.
I don’t think that the prospect of being put to sleep temporarily would be such an issue. You might also put that as part of their terminal value, alongside the ‘solve FAI’ or describe FAI.
If you have figured out how to encode ‘solve FAI’ concretely into an actual AGI’s utility function, then there is no need to tack on the desire to sleep indefinitely.
If there’s a dangerous mistake in the ‘solve FAI’ encoding, then its not clear what kind of safety the sleep precommittment actually buys.
Well, even if you had figured out how to encode ‘solve FAI’ isn’t there still scope for things going wrong? I thought that part of the problem with encoding an AI with some terminal value is that you can never be quite sure that it would pursue that value in the way you want it to. Which is why you need to limit there intelligence to something which we can cope with, as well as give them a time limit, so they don’t manage to create havoc in the world.
Also, as you go through the iterations of the AI, you’d get better and better ideas as how to solve FAI; either ourselves, or by the next AI. Because you can see where the AI has gone wrong, and how, and prevent that in the next iteration.
Do you think that the AI just has too much scope to go wrong, and that’s what I’m underestimating? If so, that would be good to know. I mean, that was the point of this post.
Well, even if you had figured out how to encode ‘solve FAI’ isn’t there still scope for things going wrong?
Sure, in the sense that an alien UFAI could still arrive the next day and wipe us out, or a large asteroid, or any other low probability catastrophe. Or the FAI could just honestly fail at its goal, and produce an UFAI by accident.
There is always scope for things going wrong. However, encoding ‘solve FAI’ turns out to be essentially the same problem as encoding ‘FAI’, because ‘FAI’ isn’t a fixed thing, its a complex dynamic. More specifically FAI is an AI that creates improved successor versions of itself, thus it has ‘solve FAI’ as part of its description already.
Also, as you go through the iterations of the AI, you’d get better and better ideas as how to solve FAI; either ourselves, or by the next AI. Because you can see where the AI has gone wrong, and how, and prevent that in the next iteration.
Yes—with near certainty the road to complex AI involves iterative evolutionary development like any other engineering field. MIRI seems to want to solve the whole safety issue in pure theory first. Meanwhile the field of machine learning is advancing rather quickly to AGI, and in that field progress is driven more by experimental research than pure theory—as there is only so much one can do with math on paper.
The risk stems from a few considerations: once we have AGI then superintelligence could follow very shortly thereafter, and thus the first AGI to scale to superintelligence could potentially takeover the world and prevent any further experimentation with other designs.
Your particular proposal involves constraints on the intelligence of the AGI—a class of techniques discussed in detail in Bostrom’s Superintelligence. The danger there is that any such constraints increase the liklihood that some other less safe competitor will then reach superintelligence first. It would be better to have a design that is intrinsically benevolent/safe and doesnt need such constraints—if such a thing is possible. The tradeoffs are rather complex.
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you’ve basically already made one. That is, having the precise definition of the thing massively reduces the difficulty of creating it i.e. when people ask ‘do we have free will?’ defining free will greatly reduces the complexity of the problem. Is that correct?
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you’ve basically already made one.
Yes. Although to be clear, the most likely path probably involves a very indirect form of specification based on learning from humans.
Ok. So why could you not replace ‘encode an FAI’ with ‘define an FAI?’ And you would place all the restrictions I detailed on that AI. Or is there still a problem?
An AI intelligent enough to create improved versions of itself and discuss such plans at a high level with human researchers is also quite likely to strongly disagree with any plan that involves putting it to sleep indefinitely.
There may exist AI utility functions and designs that would not object to your plan, but they are needles in a haystack, and if we had such designs to start with we wouldn’t need (or want) to kill them.
Hmm. So you too think that this fails as a backup? I guess I overestimated the difficulty of doing such a thing; some others have commented that this path would be even more difficult than just creating FAI.
Though I am not suggesting we kill the AIs, as that would be highly unethical.I don’t think that the prospect of being put to sleep temporarily would be such an issue. You might also put that as part of their terminal value, alongside the ‘solve FAI’ or describe FAI. I responded to another commentator who made a similar comment to yours. Do you mind checking it out and saying what you thinks wrong with my reply? I’d be quite grateful.
P.S. some people have misunderstood my intentions here. I do not in any way expect this to be the NEXT GREAT IDEA. I just couldn’t see anything wrong with this, which almost certainly meant there were gaps in my knowledge. And I thought the fastest way to see where I went wrong would be to post my idea here and see what people say. I apologise for any confusion I caused. I’ll try to be more clear next time.
Not sure what other comment you are referring to.
If you have figured out how to encode ‘solve FAI’ concretely into an actual AGI’s utility function, then there is no need to tack on the desire to sleep indefinitely.
If there’s a dangerous mistake in the ‘solve FAI’ encoding, then its not clear what kind of safety the sleep precommittment actually buys.
Well, even if you had figured out how to encode ‘solve FAI’ isn’t there still scope for things going wrong? I thought that part of the problem with encoding an AI with some terminal value is that you can never be quite sure that it would pursue that value in the way you want it to. Which is why you need to limit there intelligence to something which we can cope with, as well as give them a time limit, so they don’t manage to create havoc in the world.
Also, as you go through the iterations of the AI, you’d get better and better ideas as how to solve FAI; either ourselves, or by the next AI. Because you can see where the AI has gone wrong, and how, and prevent that in the next iteration.
Do you think that the AI just has too much scope to go wrong, and that’s what I’m underestimating? If so, that would be good to know. I mean, that was the point of this post.
Sure, in the sense that an alien UFAI could still arrive the next day and wipe us out, or a large asteroid, or any other low probability catastrophe. Or the FAI could just honestly fail at its goal, and produce an UFAI by accident.
There is always scope for things going wrong. However, encoding ‘solve FAI’ turns out to be essentially the same problem as encoding ‘FAI’, because ‘FAI’ isn’t a fixed thing, its a complex dynamic. More specifically FAI is an AI that creates improved successor versions of itself, thus it has ‘solve FAI’ as part of its description already.
Yes—with near certainty the road to complex AI involves iterative evolutionary development like any other engineering field. MIRI seems to want to solve the whole safety issue in pure theory first. Meanwhile the field of machine learning is advancing rather quickly to AGI, and in that field progress is driven more by experimental research than pure theory—as there is only so much one can do with math on paper.
The risk stems from a few considerations: once we have AGI then superintelligence could follow very shortly thereafter, and thus the first AGI to scale to superintelligence could potentially takeover the world and prevent any further experimentation with other designs.
Your particular proposal involves constraints on the intelligence of the AGI—a class of techniques discussed in detail in Bostrom’s Superintelligence. The danger there is that any such constraints increase the liklihood that some other less safe competitor will then reach superintelligence first. It would be better to have a design that is intrinsically benevolent/safe and doesnt need such constraints—if such a thing is possible. The tradeoffs are rather complex.
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you’ve basically already made one. That is, having the precise definition of the thing massively reduces the difficulty of creating it i.e. when people ask ‘do we have free will?’ defining free will greatly reduces the complexity of the problem. Is that correct?
Yes. Although to be clear, the most likely path probably involves a very indirect form of specification based on learning from humans.
Ok. So why could you not replace ‘encode an FAI’ with ‘define an FAI?’ And you would place all the restrictions I detailed on that AI. Or is there still a problem?