I don’t think that the prospect of being put to sleep temporarily would be such an issue. You might also put that as part of their terminal value, alongside the ‘solve FAI’ or describe FAI.
If you have figured out how to encode ‘solve FAI’ concretely into an actual AGI’s utility function, then there is no need to tack on the desire to sleep indefinitely.
If there’s a dangerous mistake in the ‘solve FAI’ encoding, then its not clear what kind of safety the sleep precommittment actually buys.
Well, even if you had figured out how to encode ‘solve FAI’ isn’t there still scope for things going wrong? I thought that part of the problem with encoding an AI with some terminal value is that you can never be quite sure that it would pursue that value in the way you want it to. Which is why you need to limit there intelligence to something which we can cope with, as well as give them a time limit, so they don’t manage to create havoc in the world.
Also, as you go through the iterations of the AI, you’d get better and better ideas as how to solve FAI; either ourselves, or by the next AI. Because you can see where the AI has gone wrong, and how, and prevent that in the next iteration.
Do you think that the AI just has too much scope to go wrong, and that’s what I’m underestimating? If so, that would be good to know. I mean, that was the point of this post.
Well, even if you had figured out how to encode ‘solve FAI’ isn’t there still scope for things going wrong?
Sure, in the sense that an alien UFAI could still arrive the next day and wipe us out, or a large asteroid, or any other low probability catastrophe. Or the FAI could just honestly fail at its goal, and produce an UFAI by accident.
There is always scope for things going wrong. However, encoding ‘solve FAI’ turns out to be essentially the same problem as encoding ‘FAI’, because ‘FAI’ isn’t a fixed thing, its a complex dynamic. More specifically FAI is an AI that creates improved successor versions of itself, thus it has ‘solve FAI’ as part of its description already.
Also, as you go through the iterations of the AI, you’d get better and better ideas as how to solve FAI; either ourselves, or by the next AI. Because you can see where the AI has gone wrong, and how, and prevent that in the next iteration.
Yes—with near certainty the road to complex AI involves iterative evolutionary development like any other engineering field. MIRI seems to want to solve the whole safety issue in pure theory first. Meanwhile the field of machine learning is advancing rather quickly to AGI, and in that field progress is driven more by experimental research than pure theory—as there is only so much one can do with math on paper.
The risk stems from a few considerations: once we have AGI then superintelligence could follow very shortly thereafter, and thus the first AGI to scale to superintelligence could potentially takeover the world and prevent any further experimentation with other designs.
Your particular proposal involves constraints on the intelligence of the AGI—a class of techniques discussed in detail in Bostrom’s Superintelligence. The danger there is that any such constraints increase the liklihood that some other less safe competitor will then reach superintelligence first. It would be better to have a design that is intrinsically benevolent/safe and doesnt need such constraints—if such a thing is possible. The tradeoffs are rather complex.
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you’ve basically already made one. That is, having the precise definition of the thing massively reduces the difficulty of creating it i.e. when people ask ‘do we have free will?’ defining free will greatly reduces the complexity of the problem. Is that correct?
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you’ve basically already made one.
Yes. Although to be clear, the most likely path probably involves a very indirect form of specification based on learning from humans.
Ok. So why could you not replace ‘encode an FAI’ with ‘define an FAI?’ And you would place all the restrictions I detailed on that AI. Or is there still a problem?
Not sure what other comment you are referring to.
If you have figured out how to encode ‘solve FAI’ concretely into an actual AGI’s utility function, then there is no need to tack on the desire to sleep indefinitely.
If there’s a dangerous mistake in the ‘solve FAI’ encoding, then its not clear what kind of safety the sleep precommittment actually buys.
Well, even if you had figured out how to encode ‘solve FAI’ isn’t there still scope for things going wrong? I thought that part of the problem with encoding an AI with some terminal value is that you can never be quite sure that it would pursue that value in the way you want it to. Which is why you need to limit there intelligence to something which we can cope with, as well as give them a time limit, so they don’t manage to create havoc in the world.
Also, as you go through the iterations of the AI, you’d get better and better ideas as how to solve FAI; either ourselves, or by the next AI. Because you can see where the AI has gone wrong, and how, and prevent that in the next iteration.
Do you think that the AI just has too much scope to go wrong, and that’s what I’m underestimating? If so, that would be good to know. I mean, that was the point of this post.
Sure, in the sense that an alien UFAI could still arrive the next day and wipe us out, or a large asteroid, or any other low probability catastrophe. Or the FAI could just honestly fail at its goal, and produce an UFAI by accident.
There is always scope for things going wrong. However, encoding ‘solve FAI’ turns out to be essentially the same problem as encoding ‘FAI’, because ‘FAI’ isn’t a fixed thing, its a complex dynamic. More specifically FAI is an AI that creates improved successor versions of itself, thus it has ‘solve FAI’ as part of its description already.
Yes—with near certainty the road to complex AI involves iterative evolutionary development like any other engineering field. MIRI seems to want to solve the whole safety issue in pure theory first. Meanwhile the field of machine learning is advancing rather quickly to AGI, and in that field progress is driven more by experimental research than pure theory—as there is only so much one can do with math on paper.
The risk stems from a few considerations: once we have AGI then superintelligence could follow very shortly thereafter, and thus the first AGI to scale to superintelligence could potentially takeover the world and prevent any further experimentation with other designs.
Your particular proposal involves constraints on the intelligence of the AGI—a class of techniques discussed in detail in Bostrom’s Superintelligence. The danger there is that any such constraints increase the liklihood that some other less safe competitor will then reach superintelligence first. It would be better to have a design that is intrinsically benevolent/safe and doesnt need such constraints—if such a thing is possible. The tradeoffs are rather complex.
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you’ve basically already made one. That is, having the precise definition of the thing massively reduces the difficulty of creating it i.e. when people ask ‘do we have free will?’ defining free will greatly reduces the complexity of the problem. Is that correct?
Yes. Although to be clear, the most likely path probably involves a very indirect form of specification based on learning from humans.
Ok. So why could you not replace ‘encode an FAI’ with ‘define an FAI?’ And you would place all the restrictions I detailed on that AI. Or is there still a problem?