If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions (here comes Young Upstart Outsider trying to argue that Established Academic Researcher is really a Mad Scientist), you can actually find at least one substantial (implicit) claim by Wang that is worth serious consideration from SI’s point of view. And that is that building FAI may require (some) empirical testing prior to “launch”. It may not be enough to simply try to figure everything out on paper beforehand, and then wincingly press the red button with the usual “here goes nothing!” It may instead be necessary to build toy models (that can hopefully be controlled, obviously) and see how they work, to gain information about the behavior of (aspects of) the code.
Similarly, in the Goertzel dialogue, I would have argued (and meant to argue) that Goertzel’s “real point” was that EY/SI overestimate the badness of (mere) 95% success; that the target, while somewhat narrow, isn’t as narrow as the SI folks claim. This is also worth serious consideration, since one can imagine a situation where Goertzel (say) is a month away from launching his 80%-Friendly AI, while EY believes that his ready-to-go 95%-Friendly design can be improved to 97% within one month and 100% within two...what should EY do, and what will he do based on his current beliefs?
If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions (here comes Young Upstart Outsider trying to argue that Established Academic Researcher is really a Mad Scientist), you can actually find at least one substantial (implicit) claim by Wang that is worth serious consideration from SI’s point of view. And that is that building FAI may require (some) empirical testing prior to “launch”.
Testing is common practice. Surely no competent programmer would ever advocate deploying a complex program without testing it.
He is not talking about just testing, he is talking about changing the requirements and the design as the final product takes shape. Agile development would be a more suitable comparison.
Surely no competent programmer would ever advocate deploying a complex program without testing it.
With a recursively self-improving AI, once you create something able to run, running a test can turn to deploying even without programmer’s intention.
Even if we manage to split the AI into modules, and test each module independently, we should understand the process enough to make sure that the individual modules can’t recursively self-improve. And we should be pretty sure about the implication “if the individual modules work as we expect, then also the whole will work as we expect”. Otherwise we could get a result “individual modules work OK, the whole is NOT OK and it used its skills to escape the testing environment”.
With a recursively self-improving AI, once you create something able to run, running a test can turn to deploying even without programmer’s intention.
So: damage to the rest of the world is what test harnesses are there to prevent. It makes sense that—if we can engineer advanced intelligences—we’ll also be able to engineer methods of restraining them.
Depends on how we will engineer them. If we build an algorithm, knowing what it does, then perhaps yes. If we try some black-box development such as “make this huge neuron network, initialize it with random data, teach it, make a few randomly modified copies and select the ones that learn fastest, etc.” then I wouldn’t be surprised if after first thousand failed approaches, the first one able to really learn and self-improve would do something unexpected. The second approach seems more probable, because it’s simpler to try.
Also after the thousand failed experiments I predict human error in safety procedures, simply because they will feel completely unnecessary. For example, a member of the team will turn off the firewalls and connect to Facebook (for greater irony, it could be LessWrong), providing the new AI a simple escape route.
Also after the thousand failed experiments I predict human error in safety procedures, simply because they will feel completely unnecessary.
We do have some escaped criminals today. It’s not that we don’t know how to confine them securely, it’s more that we are not prepared to pay to do it. They do some damage, but it’s tolerable. What the escaped criminals tend not to do is build huge successful empires—and challenge large corporations or governments.
This isn’t likely to change as the world automates. The exterior civilization is unlikely to face serious challenges from escaped criminals. Instead it is likely to start out—and remain—much stronger than they are.
We don’t have recursively self-improving superhumanly intelligent criminals, yet. Only in comic books. Once we have a recursively self-improving superhuman AI, and it is not human-friendly, and it escapes… then we will have a comic-book situation in a real life. Except we won’t have a superhero on our side.
That’s comic-book stuff. Society is self-improving faster than its components. Component self-improvement trajectories tend to be limited by the government breaking them up or fencing them in whenever they grow too powerful.
The “superintelligent criminal” scenario is broadly like worrying about “grey goo”—or about a computer virus taking over the world. It makes much more sense to fear humans with powerful tools that magnify their wills. Indeed, the “superintelligent criminal” scenario may well be a destructive meme—since it distracts people from dealing with that much more realistic possibility.
Component self-improvement trajectories tend to be limited by the government breaking them up or fencing them in whenever they grow too powerful.
Counterexample: any successful revolution. A subset of society became strong enough to overthrow the government, despite the government trying to stop them.
It makes much more sense to fear humans with powerful tools that magnify their wills.
Could a superhuman AI use human allies and give them this kind of tools?
Component self-improvement trajectories tend to be limited by the government breaking them up or fencing them in whenever they grow too powerful.
Counterexample: any successful revolution. A subset of society became strong enough to overthrow the government, despite the government trying to stop them.
Sure, but look at the history of revolutions in large powerful demcracies. Of course, if North Korea develops machine intelligence, a revolution becomes more likely.
It makes much more sense to fear humans with powerful tools that magnify their wills.
Could a superhuman AI use human allies and give them this kind of tools?
That’s pretty-much what I meant: machine intelligence as a correctly-functioning tool—rather than as an out-of-control system.
That’s pretty-much what I meant: machine intelligence as a correctly-functioning tool—rather than as an out-of-control system.
Seems to me that you simply refuse to see an AI as an agent. If AI and a human conquer the world, the only possible interpretation is that the human used the AI, never that the AI used the human. Even if it was all the AI’s idea; it just means that the human used the AI as an idea generator. Even if the AI kills the human afterwards; it would just mean that the human has used the AI incorrectly and thus killed themselves.
“escape the testing environment” is poorly defined. Some people read it as “deduce the exploitable vulnerabilities in the system, hack into it, run itself with higher privileges, somehow transmit itself to other machines / the internet at large / infecting people’s brains snow-crash style”, and others read it as “convince the people running the test to give it more resources (and maybe infect their brains snow-crash style)”.
The former can be prevented by having a secure (air gapped?) system, the latter can be prevented by not running tests interactively and ignoring the moral issues with terminating (or suspending) what may possibly be an intelligent ‘person’.
It also implicitly assumes that its ability to improve its own intelligence (and therefore gain the ability to do either of the above) is unbounded by the resources of the system and will have no cost in terms of increased processing time.
I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on.
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”
Environment means both hardware and people—anything the AI has a chance to influence. We could use a narrower definition, but why should the AI respect it? By limiting our map we don’t limit the territory.
When the AI gets much smarter than humans, we may not understand the output of our inspection tools. They will give us huge amounts of data, and we will be unable to decipher what it all means.
Imagine a group of monkeys trying to enslave a human in a cave. Monkeys bring some objects from the jungle to the human and make him produce better food and toys for them (we want the AI to do some real-life optimization, otherwise it’s just money wasted on academic exercises). Monkeys understand that human getting closer to the entrance is trying to escape, and will threaten to kill him if he tries. But they don’t see the danger of human quietly sitting at the back of the cave, constructing a machine gun from the spare parts.
With a recursively self-improving AI, once you create something able to run, running a test can turn to deploying even without programmer’s intention.
I’ve asked this elsewhere to no avail, but I’m still curious—does it follow from this that developing some reliable theoretical understanding about the properties of algorithms capable and incapable of self-improvement is a useful step towards safe AI research?
I mean, it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
If we can quantify the properties of such intelligences, and construct a tool that can inspect source code prior to executing it to ensure that it lacks those properties, then it seems to follow that we can safely construct human-level AIs of various sorts. (Supposing, of course, that we’re capable of building human-level AIs at all… an assumption that appears to be adopted by convention in this context.)
it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement. There’s nothing magical about these limits; they are subject to study and to analysis. Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law. We don’t understand the engineering constraints that affect learning in humans even that well. We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
You seem to be assuming that that state of ignorance is something we can’t do anything about
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
This is a bit different from other situations, where you can first measure something
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
If we are capable of building minds smarter than ourselves, that counts as self-improvement for the purposes of this discussion. If we are not, of course, we have nothing to worry about here.
Well, another possibility is that some of us are and others of us are not. (That sentiment gets expressed fairly often in the Sequences, for example.)
In which case we might still have something to worry about as a species, but nevertheless be able to safely construct human-level optimizers, given a reliable theoretical understanding of the properties of algorithms capable of self-improvement.
Conversely, such an understanding might demonstrate that all human-level minds are potentially self-improving in the sense we’re talking about (which I would not ordinarily label “self-improvement”, but leave that aside), in which case we’d know we can’t safely construct human-level optimizers without some other safety mechanism (e.g. Friendliness)… though we might at the same time know that we can safely construct chimpanzee-level optimizers, or dog-level optimizers, or whatever the threshold turns out to be.
Which would still put us in a position to be able to safely test some of our theories about the behavior of artificial optimizers, not to mention allow us to reap the practical short-term benefits of building such things. (Humans have certainly found wetware dog-level optimizers useful to have around over most of our history; I expect we’d find software ones useful as well.)
It isn’t Utopia, granted, but then few things are.
If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions (here comes Young Upstart Outsider trying to argue that Established Academic Researcher is really a Mad Scientist), you can actually find at least one substantial (implicit) claim by Wang that is worth serious consideration from SI’s point of view. And that is that building FAI may require (some) empirical testing prior to “launch”. It may not be enough to simply try to figure everything out on paper beforehand, and then wincingly press the red button with the usual “here goes nothing!” It may instead be necessary to build toy models (that can hopefully be controlled, obviously) and see how they work, to gain information about the behavior of (aspects of) the code.
Similarly, in the Goertzel dialogue, I would have argued (and meant to argue) that Goertzel’s “real point” was that EY/SI overestimate the badness of (mere) 95% success; that the target, while somewhat narrow, isn’t as narrow as the SI folks claim. This is also worth serious consideration, since one can imagine a situation where Goertzel (say) is a month away from launching his 80%-Friendly AI, while EY believes that his ready-to-go 95%-Friendly design can be improved to 97% within one month and 100% within two...what should EY do, and what will he do based on his current beliefs?
Testing is common practice. Surely no competent programmer would ever advocate deploying a complex program without testing it.
He is not talking about just testing, he is talking about changing the requirements and the design as the final product takes shape. Agile development would be a more suitable comparison.
With a recursively self-improving AI, once you create something able to run, running a test can turn to deploying even without programmer’s intention.
Even if we manage to split the AI into modules, and test each module independently, we should understand the process enough to make sure that the individual modules can’t recursively self-improve. And we should be pretty sure about the implication “if the individual modules work as we expect, then also the whole will work as we expect”. Otherwise we could get a result “individual modules work OK, the whole is NOT OK and it used its skills to escape the testing environment”.
So: damage to the rest of the world is what test harnesses are there to prevent. It makes sense that—if we can engineer advanced intelligences—we’ll also be able to engineer methods of restraining them.
Depends on how we will engineer them. If we build an algorithm, knowing what it does, then perhaps yes. If we try some black-box development such as “make this huge neuron network, initialize it with random data, teach it, make a few randomly modified copies and select the ones that learn fastest, etc.” then I wouldn’t be surprised if after first thousand failed approaches, the first one able to really learn and self-improve would do something unexpected. The second approach seems more probable, because it’s simpler to try.
Also after the thousand failed experiments I predict human error in safety procedures, simply because they will feel completely unnecessary. For example, a member of the team will turn off the firewalls and connect to Facebook (for greater irony, it could be LessWrong), providing the new AI a simple escape route.
We do have some escaped criminals today. It’s not that we don’t know how to confine them securely, it’s more that we are not prepared to pay to do it. They do some damage, but it’s tolerable. What the escaped criminals tend not to do is build huge successful empires—and challenge large corporations or governments.
This isn’t likely to change as the world automates. The exterior civilization is unlikely to face serious challenges from escaped criminals. Instead it is likely to start out—and remain—much stronger than they are.
We don’t have recursively self-improving superhumanly intelligent criminals, yet. Only in comic books. Once we have a recursively self-improving superhuman AI, and it is not human-friendly, and it escapes… then we will have a comic-book situation in a real life. Except we won’t have a superhero on our side.
That’s comic-book stuff. Society is self-improving faster than its components. Component self-improvement trajectories tend to be limited by the government breaking them up or fencing them in whenever they grow too powerful.
The “superintelligent criminal” scenario is broadly like worrying about “grey goo”—or about a computer virus taking over the world. It makes much more sense to fear humans with powerful tools that magnify their wills. Indeed, the “superintelligent criminal” scenario may well be a destructive meme—since it distracts people from dealing with that much more realistic possibility.
Counterexample: any successful revolution. A subset of society became strong enough to overthrow the government, despite the government trying to stop them.
Could a superhuman AI use human allies and give them this kind of tools?
Sure, but look at the history of revolutions in large powerful demcracies. Of course, if North Korea develops machine intelligence, a revolution becomes more likely.
That’s pretty-much what I meant: machine intelligence as a correctly-functioning tool—rather than as an out-of-control system.
Seems to me that you simply refuse to see an AI as an agent. If AI and a human conquer the world, the only possible interpretation is that the human used the AI, never that the AI used the human. Even if it was all the AI’s idea; it just means that the human used the AI as an idea generator. Even if the AI kills the human afterwards; it would just mean that the human has used the AI incorrectly and thus killed themselves.
Am I right about this?
Er, no—I consider machines to be agents.
“escape the testing environment” is poorly defined. Some people read it as “deduce the exploitable vulnerabilities in the system, hack into it, run itself with higher privileges, somehow transmit itself to other machines / the internet at large / infecting people’s brains snow-crash style”, and others read it as “convince the people running the test to give it more resources (and maybe infect their brains snow-crash style)”.
The former can be prevented by having a secure (air gapped?) system, the latter can be prevented by not running tests interactively and ignoring the moral issues with terminating (or suspending) what may possibly be an intelligent ‘person’.
It also implicitly assumes that its ability to improve its own intelligence (and therefore gain the ability to do either of the above) is unbounded by the resources of the system and will have no cost in terms of increased processing time.
I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”
Environment means both hardware and people—anything the AI has a chance to influence. We could use a narrower definition, but why should the AI respect it? By limiting our map we don’t limit the territory.
When the AI gets much smarter than humans, we may not understand the output of our inspection tools. They will give us huge amounts of data, and we will be unable to decipher what it all means.
Imagine a group of monkeys trying to enslave a human in a cave. Monkeys bring some objects from the jungle to the human and make him produce better food and toys for them (we want the AI to do some real-life optimization, otherwise it’s just money wasted on academic exercises). Monkeys understand that human getting closer to the entrance is trying to escape, and will threaten to kill him if he tries. But they don’t see the danger of human quietly sitting at the back of the cave, constructing a machine gun from the spare parts.
I’ve asked this elsewhere to no avail, but I’m still curious—does it follow from this that developing some reliable theoretical understanding about the properties of algorithms capable and incapable of self-improvement is a useful step towards safe AI research?
I mean, it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
If we can quantify the properties of such intelligences, and construct a tool that can inspect source code prior to executing it to ensure that it lacks those properties, then it seems to follow that we can safely construct human-level AIs of various sorts. (Supposing, of course, that we’re capable of building human-level AIs at all… an assumption that appears to be adopted by convention in this context.)
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement.
There’s nothing magical about these limits; they are subject to study and to analysis.
Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law.
We don’t understand the engineering constraints that affect learning in humans even that well.
We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
If we are capable of building minds smarter than ourselves, that counts as self-improvement for the purposes of this discussion. If we are not, of course, we have nothing to worry about here.
Well, another possibility is that some of us are and others of us are not. (That sentiment gets expressed fairly often in the Sequences, for example.)
In which case we might still have something to worry about as a species, but nevertheless be able to safely construct human-level optimizers, given a reliable theoretical understanding of the properties of algorithms capable of self-improvement.
Conversely, such an understanding might demonstrate that all human-level minds are potentially self-improving in the sense we’re talking about (which I would not ordinarily label “self-improvement”, but leave that aside), in which case we’d know we can’t safely construct human-level optimizers without some other safety mechanism (e.g. Friendliness)… though we might at the same time know that we can safely construct chimpanzee-level optimizers, or dog-level optimizers, or whatever the threshold turns out to be.
Which would still put us in a position to be able to safely test some of our theories about the behavior of artificial optimizers, not to mention allow us to reap the practical short-term benefits of building such things. (Humans have certainly found wetware dog-level optimizers useful to have around over most of our history; I expect we’d find software ones useful as well.)
It isn’t Utopia, granted, but then few things are.