P6 is not necessary for fooming. Whether or not the researchers gave a strict utility function for the intelligence should not necessarily alter whether or not it fooms.
Even the creation of paperclips is a much more complex goal than telling an AI to compute as many digits of Pi as possible.
Yet both are about as unpleasant for humans.
There’s also some tension between 5 and 6. If the AI doesn’t have well-defined goals, it won’t necessarily have an issue with self-improvement altering apparent “goals”.
I am curious, do you agree that “AI going FOOM” contains many implicit predictions about many topics, like the nature of intelligence and various unproven and unstated conjectures in fields as diverse as complexity theory, economics and rationality (e.g. the consequences of utility maximization)?
I was thinking about writing some top experts about the topic of recursive self-improvement, as I already did about the SIAI itself. You seem to know something about complexity theory and higher mathematics in general, if I remember right. Could you help me to formulate an inquiry about “recursive self-improvement” in a concise and precise way?
Or do you think it is a bad idea? I also thought about doing the same for MWI. But maybe you think the opinion of some actual physicists, AI researchers and complexity theorists, is completely worthless compared to what Eliezer Yudkowsky thinks. I am not sure...
The reason for this idea stems from my perception that quite a few people here on LW talk about “AI going FOOM” as if they knew exactly that any skepticism about it must be bullshit, “because for obvious reasons an AI can go FOOM, it’s written into the laws of physics...”
There’s also some tension between 5 and 6.
As far as I can tell, but maybe some AGI researcher can correct me on this, recursive self-improvement demands goal-stability (P5), otherwise a rational agent wouldn’t self-improve, as it would not be instrumental to its current goals. For example, if a paperclip maximizer wouldn’t be able to tell that once after it improves its intelligence dramatically, it would still be paperclip-friendly, it wouldn’t risk self-improvement until it was able to prove paperclip-friendliness. This means that it would have to be unable to benefit from its ability to self-improve in solving goal-stability and paperclip-friendliness.
Further, to compute a cost-benefit analysis of self-improvement and to measure its success, an AGI will need highly specific goal-parameters (P6), i.e. a well-defined utility-function. If for example, you tell an AGI to calculate 10 digits of Pi, rather than 10^100, its cost-benefit analysis wouldn’t suggest that it was instrumental to turn the universe into computronium. If you think this is wrong, I’d like to hear your arguments. Why would a rational agent with imprecise optimization parameters, e.g. the paperclips have a tolerance far larger than a nanometer, conclude that it was economical to take over the whole planet to figure out how to design such paperclips?
The arguments I often hear are along the lines of, “it will try to do it as fast as possible”, “it will be instrumental to kill all humans so that they can’t destroy its precious paperclips”. Well, if it wasn’t told to care about how quickly the paperclips are to be produced, why wouldn’t it just decide that it could as well do it slowly? If it wasn’t told to care about the destruction of paperclips, why would it care about possible risks from humans?
recursive self-improvement demands goal-stability (P5), otherwise a rational agent wouldn’t self-improve, as it would not be instrumental to its current goals. For example, if a paperclip maximizer wouldn’t be able to tell that once after it improves its intelligence dramatically, it would still be paperclip-friendly, it wouldn’t risk self-improvement until it was able to prove paperclip-friendliness.
It “wouldn’t risk it?” And yet one might think there’s some reward that someone would be willing to take a chance for—if currently you generate 10 utility per day and you have a chance to increase that to 100, you should do it if you have a better than 1⁄10 chance (if the other 9⁄10 are 0 utility per day).
Further, to compute a cost-benefit analysis of self-improvement and to measure its success, an AGI will need highly specific goal-parameters (P6), i.e. a well-defined utility-function.
The AI could have any decision-choosing system it wants. It could calculate utilities precisely and compare that with a thorough utility function, or on the other hand it could have a list of a few thousand rules it followed as best it could, weighting rules by a time-inconsistent method like priming. If the question is “are there practical (though not necessarily safe) decision-choosing systems other than utility?” I’d say the answer is yes.
I am curious, do you agree that “AI going FOOM” contains many implicit predictions about many topics, like the nature of intelligence and various unproven and unstated conjectures in fields as diverse as complexity theory, economics and rationality (e.g. the consequences of utility maximization)?
To some extent yes, but I’m certainly not an expert on this. It seems that there are many different proposed pathways leading to a foom like situation. So, while each of them involves a fair number of premises, it is hard to tell if the end result is likely or not.
I was thinking about writing some top experts about the topic of recursive self-improvement, as I already did about the SIAI itself. You seem to know something about complexity theory and higher mathematics in general, if I remember right. Could you help me to formulate an inquiry about “recursive self-improvement” in a concise and precise way?
I’m not sure there are any real experts on recursive self-improvement out there. The closest that I’m aware of is something like compiler experts, but even that doesn’t really recursively self-improve: If you run an efficient compiler on its own source code, you might end up with a faster compiler, but it will still give the same output. Expert on recursive self-improvement sounds to me to be a bit like being a xenobiologist. There’s probably a field there, but there’s a massive lack of data.
Or do you think it is a bad idea? I also thought about doing the same for MWI. But maybe you think the opinion of some actual physicists, AI researchers and complexity theorists, is completely worthless compared to what Eliezer Yudkowsky thinks. I am not sure...
I think here and the paragraph above you are coming across as a bit less diplomatic than you need to be. If this is directed at me at least, I think that Eliezer probably overestimates what can likely be done in terms of software improvements and doesn’t appreciate how complexity issues can be a barrier. However, my own area of expertise is actually number theory not complexity theory. But at least as far as MWI is concerned, that isn’t a position that is unique to Yudkowsky. A large fraction of practicing physicists support MWI. In all these cases, Eliezer has laid out his arguments so it isn’t necessary to trust him in any way when evaluating them. It is possible that there are additional thought processes.
That said, I do think that there’s a large fraction of LWians who take fooming as an almost definite result. This confuses me not because I consider a foom event to be intrinsically unlikely but because it seems to imply extreme certainty about events that by their very nature we will have trouble understanding and haven’t happened yet. One would think that this would strongly push confidence about foom estimates lower, but it doesn’t seem to do that. Yet, at the same time, it isn’t that relevant: A 1% chance of fooming would still make a fooming AI one of the most likely existential risk events.
Manfred below seems to have address some of the P5 concerns, but I’d like to address a more concrete counterexample. As humans learn and grow their priorities change. Most humans don’t go out of their way to avoid learning even though it will result in changing priorities.
I’m not sure there are any real experts on recursive self-improvement out there. The closest that I’m aware of is something like compiler experts, but even that doesn’t really recursively self-improve: If you run an efficient compiler on its own source code, you might end up with a faster compiler, but it will still give the same output. Expert on recursive self-improvement sounds to me to be a bit like being a xenobiologist. There’s probably a field there, but there’s a massive lack of data.
Only if you fail to consider history so far. Classical biological systems self-improve. We have a fair bit of information about them. Cultural systems self-improve too, and we have a lot of information about them too. In both cases the self-improvement extends to increases in collective intelligence—and in the latter case the process even involves deliberative intelligent design.
Corporations like Google pretty literally rewire their own e-brains—to increase their own intelligence.
If you ignore all of that data, then you may not have much data left. However, that data is rather obviously highly relevant. If you propose ignoring it, there need to be good reasons for doing that.
The whole idea that self-modifying intelligent computer programs are a never-before seen phenomenon that changes the rules completely is a big crock of nonsense.
P6 is not necessary for fooming. Whether or not the researchers gave a strict utility function for the intelligence should not necessarily alter whether or not it fooms.
Yet both are about as unpleasant for humans.
There’s also some tension between 5 and 6. If the AI doesn’t have well-defined goals, it won’t necessarily have an issue with self-improvement altering apparent “goals”.
I am curious, do you agree that “AI going FOOM” contains many implicit predictions about many topics, like the nature of intelligence and various unproven and unstated conjectures in fields as diverse as complexity theory, economics and rationality (e.g. the consequences of utility maximization)?
I was thinking about writing some top experts about the topic of recursive self-improvement, as I already did about the SIAI itself. You seem to know something about complexity theory and higher mathematics in general, if I remember right. Could you help me to formulate an inquiry about “recursive self-improvement” in a concise and precise way?
Or do you think it is a bad idea? I also thought about doing the same for MWI. But maybe you think the opinion of some actual physicists, AI researchers and complexity theorists, is completely worthless compared to what Eliezer Yudkowsky thinks. I am not sure...
The reason for this idea stems from my perception that quite a few people here on LW talk about “AI going FOOM” as if they knew exactly that any skepticism about it must be bullshit, “because for obvious reasons an AI can go FOOM, it’s written into the laws of physics...”
As far as I can tell, but maybe some AGI researcher can correct me on this, recursive self-improvement demands goal-stability (P5), otherwise a rational agent wouldn’t self-improve, as it would not be instrumental to its current goals. For example, if a paperclip maximizer wouldn’t be able to tell that once after it improves its intelligence dramatically, it would still be paperclip-friendly, it wouldn’t risk self-improvement until it was able to prove paperclip-friendliness. This means that it would have to be unable to benefit from its ability to self-improve in solving goal-stability and paperclip-friendliness.
Further, to compute a cost-benefit analysis of self-improvement and to measure its success, an AGI will need highly specific goal-parameters (P6), i.e. a well-defined utility-function. If for example, you tell an AGI to calculate 10 digits of Pi, rather than 10^100, its cost-benefit analysis wouldn’t suggest that it was instrumental to turn the universe into computronium. If you think this is wrong, I’d like to hear your arguments. Why would a rational agent with imprecise optimization parameters, e.g. the paperclips have a tolerance far larger than a nanometer, conclude that it was economical to take over the whole planet to figure out how to design such paperclips?
The arguments I often hear are along the lines of, “it will try to do it as fast as possible”, “it will be instrumental to kill all humans so that they can’t destroy its precious paperclips”. Well, if it wasn’t told to care about how quickly the paperclips are to be produced, why wouldn’t it just decide that it could as well do it slowly? If it wasn’t told to care about the destruction of paperclips, why would it care about possible risks from humans?
It “wouldn’t risk it?” And yet one might think there’s some reward that someone would be willing to take a chance for—if currently you generate 10 utility per day and you have a chance to increase that to 100, you should do it if you have a better than 1⁄10 chance (if the other 9⁄10 are 0 utility per day).
The AI could have any decision-choosing system it wants. It could calculate utilities precisely and compare that with a thorough utility function, or on the other hand it could have a list of a few thousand rules it followed as best it could, weighting rules by a time-inconsistent method like priming. If the question is “are there practical (though not necessarily safe) decision-choosing systems other than utility?” I’d say the answer is yes.
To some extent yes, but I’m certainly not an expert on this. It seems that there are many different proposed pathways leading to a foom like situation. So, while each of them involves a fair number of premises, it is hard to tell if the end result is likely or not.
I’m not sure there are any real experts on recursive self-improvement out there. The closest that I’m aware of is something like compiler experts, but even that doesn’t really recursively self-improve: If you run an efficient compiler on its own source code, you might end up with a faster compiler, but it will still give the same output. Expert on recursive self-improvement sounds to me to be a bit like being a xenobiologist. There’s probably a field there, but there’s a massive lack of data.
I think here and the paragraph above you are coming across as a bit less diplomatic than you need to be. If this is directed at me at least, I think that Eliezer probably overestimates what can likely be done in terms of software improvements and doesn’t appreciate how complexity issues can be a barrier. However, my own area of expertise is actually number theory not complexity theory. But at least as far as MWI is concerned, that isn’t a position that is unique to Yudkowsky. A large fraction of practicing physicists support MWI. In all these cases, Eliezer has laid out his arguments so it isn’t necessary to trust him in any way when evaluating them. It is possible that there are additional thought processes.
That said, I do think that there’s a large fraction of LWians who take fooming as an almost definite result. This confuses me not because I consider a foom event to be intrinsically unlikely but because it seems to imply extreme certainty about events that by their very nature we will have trouble understanding and haven’t happened yet. One would think that this would strongly push confidence about foom estimates lower, but it doesn’t seem to do that. Yet, at the same time, it isn’t that relevant: A 1% chance of fooming would still make a fooming AI one of the most likely existential risk events.
Manfred below seems to have address some of the P5 concerns, but I’d like to address a more concrete counterexample. As humans learn and grow their priorities change. Most humans don’t go out of their way to avoid learning even though it will result in changing priorities.
Only if you fail to consider history so far. Classical biological systems self-improve. We have a fair bit of information about them. Cultural systems self-improve too, and we have a lot of information about them too. In both cases the self-improvement extends to increases in collective intelligence—and in the latter case the process even involves deliberative intelligent design.
Corporations like Google pretty literally rewire their own e-brains—to increase their own intelligence.
If you ignore all of that data, then you may not have much data left. However, that data is rather obviously highly relevant. If you propose ignoring it, there need to be good reasons for doing that.
The whole idea that self-modifying intelligent computer programs are a never-before seen phenomenon that changes the rules completely is a big crock of nonsense.