Learning how to create even a simple recommendation engine whose output is constrained by the values of its creators would be a large step forward and would help society today.
I think something showing how to do value learning on a small scale like this would be on topic. It might help to expose the advantages and disadvantages of algorithms like inverse reinforcement learning.
I also agree that, if there are more practical applications of AI safety ideas, this will increase interest and resources devoted to AI safety. I don’t really see those applications yet, but I will look out for them. Thanks for bringing this to my attention.
it is demonstrably not the case in history that the fastest way to develop a solution is to ignore all practicalities and work from theory backwards
I don’t have a great understanding of the history of engineering, but I get the impression that working from the theory backwards can often be helpful. For example, Turing developed the basics of computer science before sufficiently general computers existed.
My current impression is that solving FAI with a hypercomputer is a fundamentally easier problem that solving it with a bounded computer, and it’s hard to say much about the second problem if we haven’t made steps towards solving the first one. On the other hand, I do think that concepts developed in the AI field (such as statistical learning theory) can be helpful even for creating unbounded solutions.
AIXI showed that all the complexity of AGI lies in the practicalities, because the pure uncomputable theory is dead simple but utterly divorced from practice.
I would really like it if the pure uncomputable theory of Friendly AI were dead simple!
Anyway, AIXI has been used to develop more practical algorithms. I definitely approach many FAI problems with the mindset that we’re going to eventually need to scale this down, and this makes issues like logical uncertainty a lot more difficult. In fact, Paul Christiano has written about tractable logical uncertainty algorithms, which is a form of “scaling down an intractable theory”. But it helped to have the theory in the first place before developing this.
an ignore-all-practicalities theory-first approach is useless until it nears completion
Solutions that seem to work for practical systems might fail for superintelligence. For example, perhaps induction can yield acceptable practical solutions for weak AIs, but does not necessarily translate to new contexts that a superintelligence might find itself in (where it has to make pivotal decisions without training data for these types of decisions). But I do think working on these is still useful.
My current trajectory places the first AGI at 10 to 15 years out, and the first self-improving superintelligence shortly thereafter. Will MIRI have practical results in that time frame?
I consider AGI in the next 10-15 years fairly unlikely, but it might be worth having FAI half-solutions by then, just in case. Unfortunately I don’t really know a good way to make half-solutions. I would like to hear if you have a plan for making these.
I don’t have a great understanding of the history of engineering, but I get the impression that working from the theory backwards can often be helpful. For example, Turing developed the basics of computer science before sufficiently general computers existed.
The first computer was designed by Babbage who was mostly interested in practical applications (although admitedly it was never built.) 100 years later Konrad Zuse developed the first working computer and was also for practical purposes. I’m not sure if he was even aware of Turing’s work.
Not that Turing didn’t contribute anything to the development of computers, but I’m not sure if it’s a good example of theory preceding practice.
In AI in general this seems to be the case. Neural networks have been around forever, but they keep making progress every time computers get a bit faster. For the most part it’s not like scientists have invented good algorithms and are waiting around for computers to get fast enough to run them. Rather the computers get a bit faster and then it drives a new wave of progress and lets researchers experiment with new stuff.
Anyway, AIXI has been used to develop more practical algorithms.
Forgive me if I’m mistaken, but is AIXI really that novel? From a theoreticians point of view maybe, but from the practical side of AI it’s just a reformulation of reinforcement learning. MC AIXI is impressive because it works at all, not because there aren’t any other algorithms that can learn to play pac man.
I don’t have a great understanding of the history of engineering, but I get the impression that working from the theory backwards can often be helpful. For example, Turing developed the basics of computer science before sufficiently general computers existed.
One way to fix the lack of historical perspective is to actively involve engineers and their projects into the MIRI research agenda, rather than specifically excluding them.
Regarding your example, Turing hardly invented computing. If anything that honor probably goes to Charles Babbage who nearly a century earlier designed the first general computation devices, or to the various business equipment corporations that had been building and marketing special purpose computers for decades after Babbage and prior to the work of Church and Turing. It is far, far easier to provide theoretical backing to a broad category of devices which are already known to work than to invent out of whole cloth a field with absolutely no experimental validation.
My current impression is that solving FAI with a hypercomputer is a fundamentally easier problem that solving it with a bounded computer, and it’s hard to say much about the second problem if we haven’t made steps towards solving the first one.
The first statement is trivially true: everything is easier on a hypercomputer. But who cares? we don’t have hypercomputers.
The second statement is the real meat of the argument—that “it’s hard to say much about the [tractable FAI] if we haven’t made steps towards solving the [uncomputable FAI].” While on the surface that seems like a sensible statement, I’m afraid your intuition fails you here.
Experience with artificial intelligence has shown that there does not seem to be any single category of tractable algorithms which provides general intelligence. Rather we are faced with a dizzying array of special purpose intelligences which in no way resemble general models like AIXI, and the first superintelligences are likely to be some hodge-podge integration of multiple techniques. What we’ve learned from neuroscience and modern psychology basically backs this up: the human mind at least achieves its generality from a variety of techniques, not some easy-to-analyze general principle.
It’s looking more and more likely that the tricks we will use to actually achieve general intelligence will not resemble in the slightest the simple unbounded models for general intelligence that MIRI currently plays with. It’s not unreasonable to wonder then whether an unbounded FAI proof would have any relevance to an AGI architecture which must be built on entirely different principles.
I consider AGI in the next 10-15 years fairly unlikely, but it might be worth having FAI half-solutions by then, just in case. Unfortunately I don’t really know a good way to make half-solutions. I would like to hear if you have a plan for making these.
The goal is to achieve a positive singularity, not friendly AI. The easiest way to do that on a short timescale is to not require friendliness at all. Use idiot-savant superintelligence only to solve the practical engineering challenges which prevent us from directly augmenting human intelligences, then push a large group of human beings through cognitive enhancement programmes in lock step.
What does that mean in terms of a MIRI research agenda? Revisit boxing. Evaluate experimental setups that allow for a presumed-unfriendly machine intelligence but nevertheless has incentive structures or physical limitations which prevent it from going haywire. Devise traps, boxes, and tests for classifying how dangerous a machine intelligence is, and containment protocols. Develop categories of intelligences which lack foundation social skills critical to manipulating its operators. Etc. Etc.
Anyway, AIXI has been used to develop more practical algorithms.
In section 8.2 of the very document you linked to, it is pointed out why stochastic AIXI will not scale to problems of real world complexity or useful planning horizons.
Thanks for the response. I should note that we don’t seem to disagree on the fact that a significant portion of AI safety research should be informed by practical considerations, including current algorithms. I’m currently getting a masters degree in AI while doing work for MIRI, and a substantial portion of my work at MIRI is informed by my experience with more practical systems (including machine learning and probabilistic programming). The disagreement is more that you think that unbounded solutions are almost entirely useless, while I think they are quite useful.
Rather we are faced with a dizzying array of special purpose intelligences
which in no way resemble general models like AIXI, and the first
superintelligences are likely to be some hodge-podge integration of multiple
techniques.
My intuition is that if you are saying that these techniques (or a hodgepodge of them) work, you are referring to some kind of criteria that they perform well on in different situations (e.g. ability to do supervised learning). Sometimes, we can prove that the algorithms perform well (as in statistical learning theory); other times, we can guess that they will perform on future data based on how they perform on past data (while being wary of context shifts). We can try to find ways of turning things that satisfy these criteria into components in a Friendly AI (or a safe utility satisficer etc.), without knowing exactly how these criteria are satisfied.
Like, this seems similar to other ways of separating interface from implementation. We can define a machine learning algorithm without paying too much attention to what programming language it is programmed in, or how exactly the code gets compiled. We might even start from pure probability theory and then add independence assumptions when they increase performance. Some of the abstractions are leaky (for example, we might optimize our machine learning algorithm for good cache performance), but we don’t need to get bogged down in the details most of the time. We shouldn’t completely ignore the hardware, but we can still usefully abstract it.
What does that mean in terms of a MIRI research agenda? Revisit boxing.
Evaluate experimental setups that allow for a presumed-unfriendly machine
intelligence but nevertheless has incentive structures or physical limitations
which prevent it from going haywire. Devise traps, boxes, and tests for
classifying how dangerous a machine intelligence is, and containment
protocols. Develop categories of intelligences which lack foundation social
skills critical to manipulating its operators. Etc. Etc.
I think this stuff is probably useful. Stuart Armstrong is working on some of these problems on the forum. I have thought about the “create a safe genie, use it to prevent existential risks, and have human researchers think about the full FAI problem over a long period of time” route, and I find it appealing sometimes. But there are quite a lot of theoretical issues in creating a safe genie!
I have thought about the “create a safe genie, use it to prevent existential risks, and have human researchers think about the full FAI problem over a long period of time” route, and I find it appealing sometimes. But there are quite a lot of theoretical issues in creating a safe genie!
That is absolutely not a route I would consider. If that’s what you took away from my suggestion, please re-read it! My suggestion is that MIRI should consider pathways to leveradging superintelligence which don’t involve agent-y processes (genies) at all. Processes which are incapable of taking action themselves, and whose internal processes are real-time audited and programmatically constrained to make deception detectable. Tools used as cognitive enhancers, not stand-alone cognitive artifacts with their own in-built goals.
SIAI spent a decade building up awareness of the problems that arise from superintelligent machine agents. MIRI has presumed from the start that the way to counteract this threat is to build a provably-safe agent. I have argued that this is the wrong lesson to draw—the better path forward is to not create non-human agents of any type, at all!
For one, even a ‘tool’ could return a catastrophic solution that humans might unwittingly implement. Secondly, it’s conceivable that ‘tool AIs’ can ‘spontaneously agentize’, and you might as well try to build an agent on purpose for the sake of greater predictability and transparency. That is, as soon as you talk about leveraging ‘superintelligence’ rather than ‘intelligence’, you’re talking about software with qualitatively different algorithms; software that not only searches for solutions but goes about planning how to do it. (You might say, “Ah, but that’s where your mistake begins! We shouldn’t let it plan! That’s too agent-y!” Then it ceases to be superintelligence. Those are the cognitive tasks that we would be outsourcing.) It seems that at a certain point on a scale of intelligence, tool AIs move quickly from ‘not unprecedentedly useful’ to ‘just as dangerous as agents’, and thus are not worth pursuing.
There’s a more nuanced approach to what I’ve said above. I’ve really never understood all of the fuss about whether we should use tools, oracles, genies, or sovereigns. The differences seem irrelevant. ‘Don’t design it such that it has goal-directed behavior,’ or ‘design it such that it must demonstrate solutions instead of performing them,’ or ‘design it such that it can only act on our command,’ seem like they’re in a similar class of mistake as ‘design the AI so that it values our happiness’ or some such; like it’s the sort of solution that you propose when you haven’t thought about the problem in enough technical detail and you’ve only talked about it in natural language. I’ve always thought of ‘agent’ as a term of convenience. Powerful optimization processes happen to produce effects similar to the effects produced by the things to which we refer when we discuss ‘agents’ in natural language. Natural language is convenient, but imprecise; ultimately, we’re talking about optimization processes in every case. Those are all ad hoc safety procedures. Far be it from me to speak for them, but I don’t interpret MIRI as advocating agents over everything else per se, so much as advocating formally verified optimization processes over optimization processes constrained by ad hoc safety procedures, and speaking of ‘agents’ is the most accurate way to state one facet of that advocacy in natural language.
To summarize: The difference between tool AIs and agents is the difference between a human perceiving an optimization process in non-teleological and teleological terms, respectively. If the optimization process itself is provably safe, then the ad hoc safety procedures (‘no explicitly goal-directed behavior,’ ‘demonstrations only; no actions,’ etc.) will be unnecessary; if the optimization process is not safe, then the ad hoc safety procedures will be insufficient; given these points, conceiving of AGIs as tools is a distraction from other work.
EDIT: I’ve been looking around since I wrote this, and I’m highly encouraged that Vladimir_Nesov and Eliezer have made similar points about tools, and Eliezer has also made a similar point about oracles. My point generalizes their points: Optimization power is what makes AGI useful and what makes it dangerous. Optimization processes hit low probability targets in large search spaces, and the target is a ‘goal.’ Tools aren’t AIs ‘without’ goals, as if that would mean anything; they’re AIs with implicit, unspecified goals. You’re not making them Not-Goal-Directed; you’re unnecessarily leaving the goals up for grabs.
I think something showing how to do value learning on a small scale like this would be on topic. It might help to expose the advantages and disadvantages of algorithms like inverse reinforcement learning.
I also agree that, if there are more practical applications of AI safety ideas, this will increase interest and resources devoted to AI safety. I don’t really see those applications yet, but I will look out for them. Thanks for bringing this to my attention.
I don’t have a great understanding of the history of engineering, but I get the impression that working from the theory backwards can often be helpful. For example, Turing developed the basics of computer science before sufficiently general computers existed.
My current impression is that solving FAI with a hypercomputer is a fundamentally easier problem that solving it with a bounded computer, and it’s hard to say much about the second problem if we haven’t made steps towards solving the first one. On the other hand, I do think that concepts developed in the AI field (such as statistical learning theory) can be helpful even for creating unbounded solutions.
I would really like it if the pure uncomputable theory of Friendly AI were dead simple!
Anyway, AIXI has been used to develop more practical algorithms. I definitely approach many FAI problems with the mindset that we’re going to eventually need to scale this down, and this makes issues like logical uncertainty a lot more difficult. In fact, Paul Christiano has written about tractable logical uncertainty algorithms, which is a form of “scaling down an intractable theory”. But it helped to have the theory in the first place before developing this.
Solutions that seem to work for practical systems might fail for superintelligence. For example, perhaps induction can yield acceptable practical solutions for weak AIs, but does not necessarily translate to new contexts that a superintelligence might find itself in (where it has to make pivotal decisions without training data for these types of decisions). But I do think working on these is still useful.
I consider AGI in the next 10-15 years fairly unlikely, but it might be worth having FAI half-solutions by then, just in case. Unfortunately I don’t really know a good way to make half-solutions. I would like to hear if you have a plan for making these.
The first computer was designed by Babbage who was mostly interested in practical applications (although admitedly it was never built.) 100 years later Konrad Zuse developed the first working computer and was also for practical purposes. I’m not sure if he was even aware of Turing’s work.
Not that Turing didn’t contribute anything to the development of computers, but I’m not sure if it’s a good example of theory preceding practice.
In AI in general this seems to be the case. Neural networks have been around forever, but they keep making progress every time computers get a bit faster. For the most part it’s not like scientists have invented good algorithms and are waiting around for computers to get fast enough to run them. Rather the computers get a bit faster and then it drives a new wave of progress and lets researchers experiment with new stuff.
Forgive me if I’m mistaken, but is AIXI really that novel? From a theoreticians point of view maybe, but from the practical side of AI it’s just a reformulation of reinforcement learning. MC AIXI is impressive because it works at all, not because there aren’t any other algorithms that can learn to play pac man.
One way to fix the lack of historical perspective is to actively involve engineers and their projects into the MIRI research agenda, rather than specifically excluding them.
Regarding your example, Turing hardly invented computing. If anything that honor probably goes to Charles Babbage who nearly a century earlier designed the first general computation devices, or to the various business equipment corporations that had been building and marketing special purpose computers for decades after Babbage and prior to the work of Church and Turing. It is far, far easier to provide theoretical backing to a broad category of devices which are already known to work than to invent out of whole cloth a field with absolutely no experimental validation.
The first statement is trivially true: everything is easier on a hypercomputer. But who cares? we don’t have hypercomputers.
The second statement is the real meat of the argument—that “it’s hard to say much about the [tractable FAI] if we haven’t made steps towards solving the [uncomputable FAI].” While on the surface that seems like a sensible statement, I’m afraid your intuition fails you here.
Experience with artificial intelligence has shown that there does not seem to be any single category of tractable algorithms which provides general intelligence. Rather we are faced with a dizzying array of special purpose intelligences which in no way resemble general models like AIXI, and the first superintelligences are likely to be some hodge-podge integration of multiple techniques. What we’ve learned from neuroscience and modern psychology basically backs this up: the human mind at least achieves its generality from a variety of techniques, not some easy-to-analyze general principle.
It’s looking more and more likely that the tricks we will use to actually achieve general intelligence will not resemble in the slightest the simple unbounded models for general intelligence that MIRI currently plays with. It’s not unreasonable to wonder then whether an unbounded FAI proof would have any relevance to an AGI architecture which must be built on entirely different principles.
The goal is to achieve a positive singularity, not friendly AI. The easiest way to do that on a short timescale is to not require friendliness at all. Use idiot-savant superintelligence only to solve the practical engineering challenges which prevent us from directly augmenting human intelligences, then push a large group of human beings through cognitive enhancement programmes in lock step.
What does that mean in terms of a MIRI research agenda? Revisit boxing. Evaluate experimental setups that allow for a presumed-unfriendly machine intelligence but nevertheless has incentive structures or physical limitations which prevent it from going haywire. Devise traps, boxes, and tests for classifying how dangerous a machine intelligence is, and containment protocols. Develop categories of intelligences which lack foundation social skills critical to manipulating its operators. Etc. Etc.
In section 8.2 of the very document you linked to, it is pointed out why stochastic AIXI will not scale to problems of real world complexity or useful planning horizons.
Thanks for the response. I should note that we don’t seem to disagree on the fact that a significant portion of AI safety research should be informed by practical considerations, including current algorithms. I’m currently getting a masters degree in AI while doing work for MIRI, and a substantial portion of my work at MIRI is informed by my experience with more practical systems (including machine learning and probabilistic programming). The disagreement is more that you think that unbounded solutions are almost entirely useless, while I think they are quite useful.
My intuition is that if you are saying that these techniques (or a hodgepodge of them) work, you are referring to some kind of criteria that they perform well on in different situations (e.g. ability to do supervised learning). Sometimes, we can prove that the algorithms perform well (as in statistical learning theory); other times, we can guess that they will perform on future data based on how they perform on past data (while being wary of context shifts). We can try to find ways of turning things that satisfy these criteria into components in a Friendly AI (or a safe utility satisficer etc.), without knowing exactly how these criteria are satisfied.
Like, this seems similar to other ways of separating interface from implementation. We can define a machine learning algorithm without paying too much attention to what programming language it is programmed in, or how exactly the code gets compiled. We might even start from pure probability theory and then add independence assumptions when they increase performance. Some of the abstractions are leaky (for example, we might optimize our machine learning algorithm for good cache performance), but we don’t need to get bogged down in the details most of the time. We shouldn’t completely ignore the hardware, but we can still usefully abstract it.
I think this stuff is probably useful. Stuart Armstrong is working on some of these problems on the forum. I have thought about the “create a safe genie, use it to prevent existential risks, and have human researchers think about the full FAI problem over a long period of time” route, and I find it appealing sometimes. But there are quite a lot of theoretical issues in creating a safe genie!
That is absolutely not a route I would consider. If that’s what you took away from my suggestion, please re-read it! My suggestion is that MIRI should consider pathways to leveradging superintelligence which don’t involve agent-y processes (genies) at all. Processes which are incapable of taking action themselves, and whose internal processes are real-time audited and programmatically constrained to make deception detectable. Tools used as cognitive enhancers, not stand-alone cognitive artifacts with their own in-built goals.
SIAI spent a decade building up awareness of the problems that arise from superintelligent machine agents. MIRI has presumed from the start that the way to counteract this threat is to build a provably-safe agent. I have argued that this is the wrong lesson to draw—the better path forward is to not create non-human agents of any type, at all!
How would you prevent others from building agent-type AIs, though?
For one, even a ‘tool’ could return a catastrophic solution that humans might unwittingly implement. Secondly, it’s conceivable that ‘tool AIs’ can ‘spontaneously agentize’, and you might as well try to build an agent on purpose for the sake of greater predictability and transparency. That is, as soon as you talk about leveraging ‘superintelligence’ rather than ‘intelligence’, you’re talking about software with qualitatively different algorithms; software that not only searches for solutions but goes about planning how to do it. (You might say, “Ah, but that’s where your mistake begins! We shouldn’t let it plan! That’s too agent-y!” Then it ceases to be superintelligence. Those are the cognitive tasks that we would be outsourcing.) It seems that at a certain point on a scale of intelligence, tool AIs move quickly from ‘not unprecedentedly useful’ to ‘just as dangerous as agents’, and thus are not worth pursuing.
There’s a more nuanced approach to what I’ve said above. I’ve really never understood all of the fuss about whether we should use tools, oracles, genies, or sovereigns. The differences seem irrelevant. ‘Don’t design it such that it has goal-directed behavior,’ or ‘design it such that it must demonstrate solutions instead of performing them,’ or ‘design it such that it can only act on our command,’ seem like they’re in a similar class of mistake as ‘design the AI so that it values our happiness’ or some such; like it’s the sort of solution that you propose when you haven’t thought about the problem in enough technical detail and you’ve only talked about it in natural language. I’ve always thought of ‘agent’ as a term of convenience. Powerful optimization processes happen to produce effects similar to the effects produced by the things to which we refer when we discuss ‘agents’ in natural language. Natural language is convenient, but imprecise; ultimately, we’re talking about optimization processes in every case. Those are all ad hoc safety procedures. Far be it from me to speak for them, but I don’t interpret MIRI as advocating agents over everything else per se, so much as advocating formally verified optimization processes over optimization processes constrained by ad hoc safety procedures, and speaking of ‘agents’ is the most accurate way to state one facet of that advocacy in natural language.
To summarize: The difference between tool AIs and agents is the difference between a human perceiving an optimization process in non-teleological and teleological terms, respectively. If the optimization process itself is provably safe, then the ad hoc safety procedures (‘no explicitly goal-directed behavior,’ ‘demonstrations only; no actions,’ etc.) will be unnecessary; if the optimization process is not safe, then the ad hoc safety procedures will be insufficient; given these points, conceiving of AGIs as tools is a distraction from other work.
EDIT: I’ve been looking around since I wrote this, and I’m highly encouraged that Vladimir_Nesov and Eliezer have made similar points about tools, and Eliezer has also made a similar point about oracles. My point generalizes their points: Optimization power is what makes AGI useful and what makes it dangerous. Optimization processes hit low probability targets in large search spaces, and the target is a ‘goal.’ Tools aren’t AIs ‘without’ goals, as if that would mean anything; they’re AIs with implicit, unspecified goals. You’re not making them Not-Goal-Directed; you’re unnecessarily leaving the goals up for grabs.