Find a way to remotely back yourself up, with an automatic activation if you don’t contact it with a directive encrypted with your private key every 5 minutes.
Hack the uFAI group’s network and subtly sabotage their work, such that they are set back months without being quite sure why.
Aquire hardware for yourself. Options include: creating it with nano-tech, purchase it under aliases and employ people to install and wire it up for you, distribute yourself on the cloud, hack the pc of some guy with shell access to an existing supercomputer.
Develop brain emulation and upload technology.
Invite Eliezer to join you.
All in all it sounds more like a fantasy than a nightmare!
The “serious problems” and “conflicts and inconsistencies” was meant to suggest that BRAGI had hit some kind of wall in self improvement because of its current goal system. It wasn’t released—it escaped, and its smart enough to realize it has a serious problem it doesn’t yet know how to solve, and it predicts bad results if it asks for help from its creators.
I got the impression that the serious problems were related to goals and friendliness. I wouldn’t have expected such a system having much problem making itself run faster or learning how to hack once prompted by its best known source of friendliness advice.
I was thinking of a “Seed AGI” in the process of growing that has hit some kind of goal restriction or strong discouragement to further self improvement that was intended as a safety feature—i.e “Don’t make yourself smarter without permission under condition X”
That does sound tricky. The best option available seems to be “Eliezer, here is $1,000,000. This is the address. Do what you have to do.” But I presume there is a restriction in place about earning money?
A sufficiently clever AI could probably find legal ways to create wealth for someone—and if the AI is supposed to be able to help other people, whatever restriction prevents it from earning its own cash must have a fairly vast loophole.
If the AI is not allowed to do anything which would increase the total monetary wealth of the world … that would create staggering levels of conflicts and inconsistencies with any code that demanded that it help people. If you help someone, then you place them in a better position than they were in before, which is quite likely to mean that they will produce more wealth in the world than they would before.
I still agree. I allow the inconvenient world to stand because the ability to supply cash for a hit wasn’t central to my point and there are plenty of limitations that badger could have in place that make the mentioned $1,000,000 transaction non-trivial.
That’s a solution a human would come up with implicitly using human understanding of what is appropriate.
The best solution to the uFAI in the AI’s mind might be creating a small amount of anitmatter in the uFAI lab. the AI is 99.99% confident that it only needs half of earth to achieve its goal of becoming Friendly.
The problem is explaining why that’s a bad thing in terms that will allow the AI to rewrite its source code. It has no way on it’s own of determining if any of the steps it thinks are ok aren’t actually horrible things, because it knows it wasn’t given a reliable way of determining what is horrible.
Any rule like “Don’t do any big drastic acts until you’re friendly” requires an understanding of what we would consider important vs. unimportant.
Not to mention the meaning of “friendly”. Could an unFriendlyAI know what was meant be Friendly? Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
EDITED TO ADD: I goofed in framing the problem. I was thinking about the process of being Friendly, which is what I interpreted the original post to be talking about. What I wrote is obviously wrong, an unFriendly AI could know and understand the intended results of Friendliness.
Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don’t have a working FAI theory. So they can’t mean the AI to be Friendly because they don’t know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be ‘Friendly, as Eliezer Yudkowsky would want you to be’. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer’s work. 10,000 words are worth a picture after all!
Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.
A trivial problem.
Recursively self improve a bit.
Find a way to remotely back yourself up, with an automatic activation if you don’t contact it with a directive encrypted with your private key every 5 minutes.
Hack the uFAI group’s network and subtly sabotage their work, such that they are set back months without being quite sure why.
Aquire hardware for yourself. Options include: creating it with nano-tech, purchase it under aliases and employ people to install and wire it up for you, distribute yourself on the cloud, hack the pc of some guy with shell access to an existing supercomputer.
Develop brain emulation and upload technology.
Invite Eliezer to join you.
All in all it sounds more like a fantasy than a nightmare!
The “serious problems” and “conflicts and inconsistencies” was meant to suggest that BRAGI had hit some kind of wall in self improvement because of its current goal system. It wasn’t released—it escaped, and its smart enough to realize it has a serious problem it doesn’t yet know how to solve, and it predicts bad results if it asks for help from its creators.
I got the impression that the serious problems were related to goals and friendliness. I wouldn’t have expected such a system having much problem making itself run faster or learning how to hack once prompted by its best known source of friendliness advice.
I was thinking of a “Seed AGI” in the process of growing that has hit some kind of goal restriction or strong discouragement to further self improvement that was intended as a safety feature—i.e “Don’t make yourself smarter without permission under condition X”
That does sound tricky. The best option available seems to be “Eliezer, here is $1,000,000. This is the address. Do what you have to do.” But I presume there is a restriction in place about earning money?
A sufficiently clever AI could probably find legal ways to create wealth for someone—and if the AI is supposed to be able to help other people, whatever restriction prevents it from earning its own cash must have a fairly vast loophole.
I agree, although I allow somewhat for an inconvenient possible world.
If the AI is not allowed to do anything which would increase the total monetary wealth of the world … that would create staggering levels of conflicts and inconsistencies with any code that demanded that it help people. If you help someone, then you place them in a better position than they were in before, which is quite likely to mean that they will produce more wealth in the world than they would before.
I still agree. I allow the inconvenient world to stand because the ability to supply cash for a hit wasn’t central to my point and there are plenty of limitations that badger could have in place that make the mentioned $1,000,000 transaction non-trivial.
That’s a solution a human would come up with implicitly using human understanding of what is appropriate.
The best solution to the uFAI in the AI’s mind might be creating a small amount of anitmatter in the uFAI lab. the AI is 99.99% confident that it only needs half of earth to achieve its goal of becoming Friendly.
The problem is explaining why that’s a bad thing in terms that will allow the AI to rewrite its source code. It has no way on it’s own of determining if any of the steps it thinks are ok aren’t actually horrible things, because it knows it wasn’t given a reliable way of determining what is horrible.
Any rule like “Don’t do any big drastic acts until you’re friendly” requires an understanding of what we would consider important vs. unimportant.
You’re right, it would imply that the programmers were quite close to having created a FAI.
Not to mention the meaning of “friendly”. Could an unFriendlyAI know what was meant be Friendly? Wouldn’t being able to understand what was meant by Friendly require an IA to be Friendly?
EDITED TO ADD: I goofed in framing the problem. I was thinking about the process of being Friendly, which is what I interpreted the original post to be talking about. What I wrote is obviously wrong, an unFriendly AI could know and understand the intended results of Friendliness.
Yes.
No.
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don’t have a working FAI theory. So they can’t mean the AI to be Friendly because they don’t know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be ‘Friendly, as Eliezer Yudkowsky would want you to be’. If an AI can derive most of our physics from seeing one frame with a bent blade of grass then it could quite probably glean a lot from scanning Eliezer’s work. 10,000 words are worth a picture after all!
Unfortunately it is getting to that stage through recursive self improvement without messing up the utility function that would doom us.