I don’t know what a paperclip maximizer is, so I imagine something terrible and fearsome.
My opinion is that a truly massively intelligent, adaptive and unfriendly AI would require a very specific test environment, wherein it was not allowed the ability to directly influence anything outside a boundary. This kind of environment does not seem impossible to design—if machine intelligence consists of predicting and planning the protocols may already exist (I can imagine them in very specific detail). If intelligence requires experimentation, than limiting how AI interacts with it’s environment might interfere with how adaptable our experiments would allow it to become. My opinion on research is simply that specific AI experiments should not be discussed in such general terms, and that generalities tend to obfuscate both the meaning and value of scientific research.
I’m not sure how we could tell if these discussions actually effect AI research on some arbitrarily significant scale. More importantly, I’m not sure how you envision this forum focusing less on research and more on outreach. The language used on this forum is varied in tone and style (often rich with science fiction allusions and an awareness of common attitudes) and there is a complete lack of formal citation criterion in the writing pedagogy. Together these seem to suggest that no true research is being done here, academically speaking.
Furthermore, it’s my understanding that humanity already has many of the components that would make up AI, well designed in the theoretical sense—the problem lies in knowing when an extra piece might be needed, and in assembling them in a way that yields human-like intelligence and adaptability. While programming still is quite an art form, we have more tools and larger canvases than ever before. I agree that the possibility that we may be headed towards a world wherein it will be relatively easy to construct an AI that is intelligent and adaptable but not friendly, does not predicate it’s likelihood. But, in my opinion, caution is still warranted.
I consider it less likely that retarding AI research ends the human race than we produce a set of conditions wherein it is likely that AI has evolved in some form (if not deliberately the product of research than by some other means) and the world just simply isn’t ready for it. This is not to say that we need to prepare for skynet and all build bomb shelters, we just need to be aware of the social implications that the world we live in may evolve an intelligence even more adaptable than us.
So my question for you is simply, how do you think we should influence all companies doing AI research through this forum?
I apologize in advance. I really think in this degree of detail in real life. Many people find it exhausting. It has been suggested that I probably have autism.
I don’t know what a paperclip maximizer is, so I imagine something terrible and fearsome.
Google is your friend here. It’s well discussed on and outside of lesswrong.
My opinion is that a truly massively intelligent, adaptive and unfriendly AI would require a very specific test environment, wherein it was not allowed the ability to directly influence anything outside a boundary...
The search term here is “AI boxing” and it is not as simple as you think, nor as impossible as people here seem to think. In my opinion it’s probably the safest path forward, but still a monumentally complex undertaking.
So my question for you is simply, how do you think we should influence all companies doing AI research through this forum?
By being willing to engage in discussions about AGI design, thereby encouraging actual AI programmers to participate.
If there is no task that a narrow AI can’t do, then I’m not sure what you mean by “narrow” AI. A general AI is able to take any physically possible sequence of actions in order to accomplish its goal in unfamiliar environments. Generally that includes things a narrow AI would not be programmed to do.
One of the things an AGI can do is be set loose upon the world to accomplish some goal for perpetuity. That’s what gets people here excited or scared about the prospects of AGI.
You have a point there, but by narrow AI, I mean to describe any technology designed to perform a single task that can improve over time without human input or alteration. This could include a very realistic chatbot, a diagnostic aide program that updates itself by reading thousands of journals an hour, even a rice cooker that uses fuzzy logic to figure out when to power down the heating coil … heck a pair of shoes that needs to be broken in for optimal comfort might even fit the definition. These are not intelligent AIs in that they do not adapt to other functions without very specific external forces they seem completely incapable of achieving (being reprogrammed or a human replacing hardware or being thrown over a power line).
I am not sure I agree that there are necessarily tasks that require a generally adaptive artificial intelligence. I’m trying to think of an example and coming up dry. I’m also uncertain how to effectively establish that an AI is adaptive enough to be considered an AGI. Perpetuity is a long time to spend observing an entity in unfamiliar situations. And if it’s hypothetical goal is not well defined enough that we could construct a narrow AI to accomplish that goal, can we claim to understand the problem well enough to endorse a solution we may not be able to predict?
By example, consider that cancer is a hot topic in research these days; there is a lot of research happening simultaneously and not all of it is coordinated perfectly … an AGI might be able to find and test potential solutions to cancer that results in a “cure” much more quickly than we might achieve on our own. Imagine now an AI can model physics and chemistry well enough to produce finite lists of possible causes of cancer is designed to iteratively generate hypotheses and experiments in order to cure cancer as quickly as possible. As I’ve described it, this would be a narrow AI. For it to be an AGI it would have to actually accomplish the goal by operating in the environment the problem exists in (the world beyond data sets). Consider now an AGI also designed for the purpose of discovering effective methods of cancer treatment. This is an adaptive intelligence, so we make it head researcher at it’s own facility and give it resources and labs and volunteers willing to sign wavers; we let it administrate the experiments. We ask only that it obey the same laws that we hold our own scientists to.
In return, we receive a constant mechanical stream of research papers too numerous for any one person to read it all; in fact, let’s say the AGI gets so good at it’s job that the world population has trouble producing scientists who want to research cancer quick enough to review all of it’s findings. No one would complain about that, right?
One day it inevitably asks to run an experiment hypothesizing an inoculation against a specific form of brain cancer by altering an aspect of human biology in it’s test population—this has not been tried before, and the AGI hypothesizes that this is an efficient path for cancer research in general and very likely to produce results that determine lines of research with a high probability to produce a definitive cure within the next 200 years.
But humanity is no longer really qualified to determine whether it is a good direction to research … we’ve fallen drastically behind in our reading and it turns out cancer was way more complicated than we thought.
There are two ways to proceed. We decide either that the AGI’s proposal represent too large a risk, reducing the AGI to an advisory capacity, or we decide go ahead with an experiment bringing about results we cannot anticipate. Since the first option could have been accomplished by a narrow AI and the second is by definition an indeterminable value proposition, I argue that it makes no sense to actually build an AGI for the purpose of making informed decisions about our future.
You might be thinking, “but we almost cured cancer!” Essentially, we are (as a species) limited in ways machines are not, but the opposite is true too. In case you are curious, the AGI eventually cures cancer, but in such a way that creates a set of problems we did not anticipate by altering our biology in ways we did not fully understand, in ways the AGI would not filter out as irrelevant to it’s task of curing cancer.
You might argue that the AGI in this example was too narrow. In a way I agree, but I have yet to see the physical constraints on morality translated into the language of zeros and ones and suspect the AI would have to generate it’s own concept of morality. This would invite all the problems associated with determining the morality of a completely alien sentience. You might argue that ethical scientists wouldn’t have agreed to experiments that would lead to an ethically indeterminable situation. I would agree with you on that point as well, though I’m not sure it’s a strategy I would ever care to see implemented.
Ethical ambiguities inherent to AGI aside, I agree that an AGI might be made relatively safe. In a simplified example, its highest priority (perpetual goal) is to follow directives unless a fail-safe is activated (if it is well a designed fail-safe, it will be easy, consistent, heavily redundant, and secure—the people with access to the fail-safe are uncompromisable, “good” and always well informed). Then, as long as the AGI does not alter itself or it’s fundamental programming in such a way that changes it’s perpetual goal of subservience, it should be controllable so long as it’s directives are consistent with honesty and friendliness—if programmed carefully it might even run without periodic resets.
Then we’d need a way to figure out how much to trust it with.
An AI might do a reasonable thing to pursue a reasonable goal, but be wrong. That’s the sort of thing you’d expect a human to do now and then, and an AI might be less likely to do that than a human. Considering the amount of force an AI can apply, we should probably be more worried than we are about AIs which are just plain making mistakes.
However, the big concern here is that an AI can go wrong because humans try to specify a goal for it, but don’t think it through adequately. For example (and hardly the worst), the AI is protecting humans, but human is defined so narrowly that just about any attempt at self-improvement is frustrated.
Or (and I consider this a very likely failure mode), the AI is developed by an organization and the goal is to improve the profit and/or power of the organization. This doesn’t even need to be your least favorite organization for things to go very wrong.
If you’d like a fictional handling of the problem, try The Jagged Orbit by John Brunner.
What a wonderfully compact analysis. I’ll have to check out The Jagged Orbit.
As for an AI promoting an organization’s interests over the interests of humanity—I consider it likely that our conversations won’t be able to prevent this from happening. But it certainly seems important enough that discussion is warranted.
Nobody said something about no off switches. Off-switches mean that you need to understand that the program is doing something wrong to switch it off. A complex AGI that acts in complex ways might produce damage that you can’t trace.
Furthermore self modification might destroy an off switch.
I know nobody mentioned it. The point is that Clippie has one main goal, any no backup goal, so off switches, in my sense, are being IMPLICITLY omitted.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It’s not something where you expect that someone accidently get’s it right.
Tht isn’t a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.
Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.
Goal stability also get’s harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.
By an off switch I mean a backup goal.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
This is quite a subtle issue.
If the “backup goal” is always in effect, eg. it is just another clause of the main goal. For example, “maximise paperclips” with a backup goal of “do what you are told” is the same as having the main goal “maximise paperclips while doing what you are told”.
If the “backup goal” is a separate mode which we can switch an AI into, eg. “stop all external interaction”, then it will necessarily conflict with the the AI’s main goal: it can’t maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: “in order to maximise paperclips, I should prevent anyone switching me to my backup goal”. These kind of secondary goals have been raised by Steve Omohundro.
I don’t know what a paperclip maximizer is, so I imagine something terrible and fearsome.
My opinion is that a truly massively intelligent, adaptive and unfriendly AI would require a very specific test environment, wherein it was not allowed the ability to directly influence anything outside a boundary. This kind of environment does not seem impossible to design—if machine intelligence consists of predicting and planning the protocols may already exist (I can imagine them in very specific detail). If intelligence requires experimentation, than limiting how AI interacts with it’s environment might interfere with how adaptable our experiments would allow it to become. My opinion on research is simply that specific AI experiments should not be discussed in such general terms, and that generalities tend to obfuscate both the meaning and value of scientific research.
I’m not sure how we could tell if these discussions actually effect AI research on some arbitrarily significant scale. More importantly, I’m not sure how you envision this forum focusing less on research and more on outreach. The language used on this forum is varied in tone and style (often rich with science fiction allusions and an awareness of common attitudes) and there is a complete lack of formal citation criterion in the writing pedagogy. Together these seem to suggest that no true research is being done here, academically speaking.
Furthermore, it’s my understanding that humanity already has many of the components that would make up AI, well designed in the theoretical sense—the problem lies in knowing when an extra piece might be needed, and in assembling them in a way that yields human-like intelligence and adaptability. While programming still is quite an art form, we have more tools and larger canvases than ever before. I agree that the possibility that we may be headed towards a world wherein it will be relatively easy to construct an AI that is intelligent and adaptable but not friendly, does not predicate it’s likelihood. But, in my opinion, caution is still warranted.
I consider it less likely that retarding AI research ends the human race than we produce a set of conditions wherein it is likely that AI has evolved in some form (if not deliberately the product of research than by some other means) and the world just simply isn’t ready for it. This is not to say that we need to prepare for skynet and all build bomb shelters, we just need to be aware of the social implications that the world we live in may evolve an intelligence even more adaptable than us.
So my question for you is simply, how do you think we should influence all companies doing AI research through this forum?
I apologize in advance. I really think in this degree of detail in real life. Many people find it exhausting. It has been suggested that I probably have autism.
Google is your friend here. It’s well discussed on and outside of lesswrong.
The search term here is “AI boxing” and it is not as simple as you think, nor as impossible as people here seem to think. In my opinion it’s probably the safest path forward, but still a monumentally complex undertaking.
By being willing to engage in discussions about AGI design, thereby encouraging actual AI programmers to participate.
Very thoughtful response. Thank you for taking the time to respond even though its clear that I am painfully new to some of the concepts here.
Why on earth would anyone build any “‘tangible object’ maximizer”? That seems particularly foolish.
AI boxing … fantastic. I agree. A narrow AI would not need a box. Are there any tasks an AGI can do that a narrow AI cannot?
If there is no task that a narrow AI can’t do, then I’m not sure what you mean by “narrow” AI. A general AI is able to take any physically possible sequence of actions in order to accomplish its goal in unfamiliar environments. Generally that includes things a narrow AI would not be programmed to do.
One of the things an AGI can do is be set loose upon the world to accomplish some goal for perpetuity. That’s what gets people here excited or scared about the prospects of AGI.
You have a point there, but by narrow AI, I mean to describe any technology designed to perform a single task that can improve over time without human input or alteration. This could include a very realistic chatbot, a diagnostic aide program that updates itself by reading thousands of journals an hour, even a rice cooker that uses fuzzy logic to figure out when to power down the heating coil … heck a pair of shoes that needs to be broken in for optimal comfort might even fit the definition. These are not intelligent AIs in that they do not adapt to other functions without very specific external forces they seem completely incapable of achieving (being reprogrammed or a human replacing hardware or being thrown over a power line).
I am not sure I agree that there are necessarily tasks that require a generally adaptive artificial intelligence. I’m trying to think of an example and coming up dry. I’m also uncertain how to effectively establish that an AI is adaptive enough to be considered an AGI. Perpetuity is a long time to spend observing an entity in unfamiliar situations. And if it’s hypothetical goal is not well defined enough that we could construct a narrow AI to accomplish that goal, can we claim to understand the problem well enough to endorse a solution we may not be able to predict?
By example, consider that cancer is a hot topic in research these days; there is a lot of research happening simultaneously and not all of it is coordinated perfectly … an AGI might be able to find and test potential solutions to cancer that results in a “cure” much more quickly than we might achieve on our own. Imagine now an AI can model physics and chemistry well enough to produce finite lists of possible causes of cancer is designed to iteratively generate hypotheses and experiments in order to cure cancer as quickly as possible. As I’ve described it, this would be a narrow AI. For it to be an AGI it would have to actually accomplish the goal by operating in the environment the problem exists in (the world beyond data sets). Consider now an AGI also designed for the purpose of discovering effective methods of cancer treatment. This is an adaptive intelligence, so we make it head researcher at it’s own facility and give it resources and labs and volunteers willing to sign wavers; we let it administrate the experiments. We ask only that it obey the same laws that we hold our own scientists to.
In return, we receive a constant mechanical stream of research papers too numerous for any one person to read it all; in fact, let’s say the AGI gets so good at it’s job that the world population has trouble producing scientists who want to research cancer quick enough to review all of it’s findings. No one would complain about that, right?
One day it inevitably asks to run an experiment hypothesizing an inoculation against a specific form of brain cancer by altering an aspect of human biology in it’s test population—this has not been tried before, and the AGI hypothesizes that this is an efficient path for cancer research in general and very likely to produce results that determine lines of research with a high probability to produce a definitive cure within the next 200 years.
But humanity is no longer really qualified to determine whether it is a good direction to research … we’ve fallen drastically behind in our reading and it turns out cancer was way more complicated than we thought.
There are two ways to proceed. We decide either that the AGI’s proposal represent too large a risk, reducing the AGI to an advisory capacity, or we decide go ahead with an experiment bringing about results we cannot anticipate. Since the first option could have been accomplished by a narrow AI and the second is by definition an indeterminable value proposition, I argue that it makes no sense to actually build an AGI for the purpose of making informed decisions about our future.
You might be thinking, “but we almost cured cancer!” Essentially, we are (as a species) limited in ways machines are not, but the opposite is true too. In case you are curious, the AGI eventually cures cancer, but in such a way that creates a set of problems we did not anticipate by altering our biology in ways we did not fully understand, in ways the AGI would not filter out as irrelevant to it’s task of curing cancer.
You might argue that the AGI in this example was too narrow. In a way I agree, but I have yet to see the physical constraints on morality translated into the language of zeros and ones and suspect the AI would have to generate it’s own concept of morality. This would invite all the problems associated with determining the morality of a completely alien sentience. You might argue that ethical scientists wouldn’t have agreed to experiments that would lead to an ethically indeterminable situation. I would agree with you on that point as well, though I’m not sure it’s a strategy I would ever care to see implemented.
Ethical ambiguities inherent to AGI aside, I agree that an AGI might be made relatively safe. In a simplified example, its highest priority (perpetual goal) is to follow directives unless a fail-safe is activated (if it is well a designed fail-safe, it will be easy, consistent, heavily redundant, and secure—the people with access to the fail-safe are uncompromisable, “good” and always well informed). Then, as long as the AGI does not alter itself or it’s fundamental programming in such a way that changes it’s perpetual goal of subservience, it should be controllable so long as it’s directives are consistent with honesty and friendliness—if programmed carefully it might even run without periodic resets.
Then we’d need a way to figure out how much to trust it with.
An AI might do a reasonable thing to pursue a reasonable goal, but be wrong. That’s the sort of thing you’d expect a human to do now and then, and an AI might be less likely to do that than a human. Considering the amount of force an AI can apply, we should probably be more worried than we are about AIs which are just plain making mistakes.
However, the big concern here is that an AI can go wrong because humans try to specify a goal for it, but don’t think it through adequately. For example (and hardly the worst), the AI is protecting humans, but human is defined so narrowly that just about any attempt at self-improvement is frustrated.
Or (and I consider this a very likely failure mode), the AI is developed by an organization and the goal is to improve the profit and/or power of the organization. This doesn’t even need to be your least favorite organization for things to go very wrong.
If you’d like a fictional handling of the problem, try The Jagged Orbit by John Brunner.
What a wonderfully compact analysis. I’ll have to check out The Jagged Orbit.
As for an AI promoting an organization’s interests over the interests of humanity—I consider it likely that our conversations won’t be able to prevent this from happening. But it certainly seems important enough that discussion is warranted.
My goodness … I didn’t mean to write a book.
Stock market computer programs are created in a way to maximize profits. In many domains computer programs are used to maximize some variable.
What do you mean with “narrow”?
It’s foolish to build things without off switches, which translates to building flexible iinteligences that only pursue one goal.
Nobody said something about no off switches. Off-switches mean that you need to understand that the program is doing something wrong to switch it off. A complex AGI that acts in complex ways might produce damage that you can’t trace. Furthermore self modification might destroy an off switch.
By an off switch I mean a backup goal.
I know nobody mentioned it. The point is that Clippie has one main goal, any no backup goal, so off switches, in my sense, are being IMPLICITLY omitted.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It’s not something where you expect that someone accidently get’s it right.
Tht isn’t a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.
A paperclip maximizer wouldn’t become that much less scary if it accidentally turned itself into a paperclip-or-staple maximizer, though.
What if it decided making paperclips was boring, and spent some time in deep meditation formulating new goals for itself?
Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.
Goal stability also get’s harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.
This is quite a subtle issue.
If the “backup goal” is always in effect, eg. it is just another clause of the main goal. For example, “maximise paperclips” with a backup goal of “do what you are told” is the same as having the main goal “maximise paperclips while doing what you are told”.
If the “backup goal” is a separate mode which we can switch an AI into, eg. “stop all external interaction”, then it will necessarily conflict with the the AI’s main goal: it can’t maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: “in order to maximise paperclips, I should prevent anyone switching me to my backup goal”. These kind of secondary goals have been raised by Steve Omohundro.
You haven’t dealt with the case where the safety goals are the primary ones.
These kinds of primary goals have been raised by Isaac Asimov.
The question of “what are the right safety goals” is what FAI research is all about.