(1) I’ve butted heads with you on timelines before. We’re about a single decade away from AGI, if reasonable and appropriate resources are allocated to such a project. FAI in the sense that MIRI defines the term—provably friendly—may or may not be possible, and just finding that out is likely to take more time than we have left. I’m glad you made this post because if your estimate for a FAI theory timeline is correct, then MIRI is on the entirely wrong track, and alternatives or hybrid alternatives involving IA need to be considered. This is a discussion which needs to happen, and in public. (Aside: this is why I have not, and continue to refuse to donate to MIRI. You’re solving the wrong problem, albeit with the best of intentions, and my money and time is better spent elsewhere.)
(2) Secrecy rarely has the intended outcome, is too easily undone, and is itself an self-destructive battle that would introduce severe risks. Achieving and maintaining operational security is a major entropy-fighting effort which distracts from the project goals, often drives away participants, and amplifies power dynamics among project leaders. That’s a potent, and very bad mix.
(3-5) Seem mostly right, or wrong without negative consequences. I don’t think there’s any specific reasons in there not to take an IA route.
Take for example an UFAI (not actively unfriendly, just MIRI’s definition of not-provably-FAI) tasked with the singular goal of augmenting the intelligence of humanity. This would be much safer than the Scary Idea strawman that MIRI usually paints, as it would in practice be engineering its own demise through an explicit goal to create runaway intelligence in humans.
If you were to actually implement this, the goal may need a little clarity:
Instead of “humanity” you may need to explicitly specify a group of humans (pre-chosen by the community or a committee for their own history of moral action and rational decision making), as well as constraints that they all advance / are augmented at approximately the same rate.
The AI should be penalized for any route it takes which results in the AI being even temporarily smarter than the humans. Presumably there is a tradeoff and an approximation here as the AI needs to be at least some level of superhuman smart in order to start the augmentation process, and needs to continue to improve itself in order to come up with even better augmentations. But it should favor plans which require smaller AI/human intelligence differentials.
To prevent weird outcomes, utility of future states should be weighted by an exponential decay. The AI should be focused on just getting existing humans augmented in the near term, and not worry itself over what it thinks future outcome would be millennia from now—that’s for the augmented humans to worry about.
And I’m sure there are literally hundreds of other potential problems and small protective tweaks required. I would rather that MIRI spent it’s time and money working on scenarios like this and formulating the various risks and counter measures, rather than obsessing over Löb obstacles (an near complete waste of time).
This is similar to how things are done in computer security. We have a well understood repertoire of attacks and general counter measures. Cryptographers then design specific protocols which through their unique construction are not vulnerable to the known attacks, and auditors make sure that implementations are free of side-channel vulnerabilities and such. How many security systems are provably secure? Very few, and none if you consider that those which have proofs have underlying assumptions which are not universally true. Nevertheless the process works and with each iterative design we move the ball forward towards end goal of a fully secure system in practice.
I’m not interested in an airtight mathematical proof of an AGI design which by your own estimate take an order of magnitude longer to develop than an unfriendly AGI. Money and time spent towards that is better directed towards other projects. I’d much rather see effort towards the evaluation of existing designs for self-modifying AGI, such as the GOLEM architecture[1], and accompanying “safe” goal systems implementing hybrid IA like I outlined above, or an AGI nanny architecture, etc.
EDIT: See Wai Dei’s post[2] for a similar argument.
EDIT2: If you want to down-vote, that’s fine. But please explain why in a reply.
Lets suppose that Nanotechnology capable of recording and manipulating brains on a subneronal level exists, to such a level that duplicating people is straightforward. Lets also assume that everyone working on this project has the same goal function, and that they aren’t too intrinsically concerned about modifying themselves. The problem you are setting this AI is, given a full brain state, modify it to be much smarter but otherwise the same “person”. Same person implies both same goal function and same memories and same personality quirks. So it would be strictly easier to tell your AI to make a new “person” that has the same goals, and I don’t care if they have the same memories. Remove a few restrictions about making it psychologically humanoid, and you are asking it to solve friendly AI, that won’t be easy.
If there was a simple drug that made humans FAR smarter while leaving our goal functions intact, the AI could find that. However, given my understanding of the human mind, making large intelligence increases and mangling the goal function seems strictly easier than making large increases in intelligence while preserving the goal function. The latter would also seem to require a technical definition of the human goal function, a major component of friendly AI.
(1) I’ve butted heads with you on timelines before. We’re about a single decade away from AGI, if reasonable and appropriate resources are allocated to such a project. FAI in the sense that MIRI defines the term—provably friendly—may or may not be possible, and just finding that out is likely to take more time than we have left. I’m glad you made this post because if your estimate for a FAI theory timeline is correct, then MIRI is on the entirely wrong track, and alternatives or hybrid alternatives involving IA need to be considered. This is a discussion which needs to happen, and in public. (Aside: this is why I have not, and continue to refuse to donate to MIRI. You’re solving the wrong problem, albeit with the best of intentions, and my money and time is better spent elsewhere.)
(2) Secrecy rarely has the intended outcome, is too easily undone, and is itself an self-destructive battle that would introduce severe risks. Achieving and maintaining operational security is a major entropy-fighting effort which distracts from the project goals, often drives away participants, and amplifies power dynamics among project leaders. That’s a potent, and very bad mix.
(3-5) Seem mostly right, or wrong without negative consequences. I don’t think there’s any specific reasons in there not to take an IA route.
Take for example an UFAI (not actively unfriendly, just MIRI’s definition of not-provably-FAI) tasked with the singular goal of augmenting the intelligence of humanity. This would be much safer than the Scary Idea strawman that MIRI usually paints, as it would in practice be engineering its own demise through an explicit goal to create runaway intelligence in humans.
If you were to actually implement this, the goal may need a little clarity:
Instead of “humanity” you may need to explicitly specify a group of humans (pre-chosen by the community or a committee for their own history of moral action and rational decision making), as well as constraints that they all advance / are augmented at approximately the same rate.
The AI should be penalized for any route it takes which results in the AI being even temporarily smarter than the humans. Presumably there is a tradeoff and an approximation here as the AI needs to be at least some level of superhuman smart in order to start the augmentation process, and needs to continue to improve itself in order to come up with even better augmentations. But it should favor plans which require smaller AI/human intelligence differentials.
To prevent weird outcomes, utility of future states should be weighted by an exponential decay. The AI should be focused on just getting existing humans augmented in the near term, and not worry itself over what it thinks future outcome would be millennia from now—that’s for the augmented humans to worry about.
And I’m sure there are literally hundreds of other potential problems and small protective tweaks required. I would rather that MIRI spent it’s time and money working on scenarios like this and formulating the various risks and counter measures, rather than obsessing over Löb obstacles (an near complete waste of time).
This is similar to how things are done in computer security. We have a well understood repertoire of attacks and general counter measures. Cryptographers then design specific protocols which through their unique construction are not vulnerable to the known attacks, and auditors make sure that implementations are free of side-channel vulnerabilities and such. How many security systems are provably secure? Very few, and none if you consider that those which have proofs have underlying assumptions which are not universally true. Nevertheless the process works and with each iterative design we move the ball forward towards end goal of a fully secure system in practice.
I’m not interested in an airtight mathematical proof of an AGI design which by your own estimate take an order of magnitude longer to develop than an unfriendly AGI. Money and time spent towards that is better directed towards other projects. I’d much rather see effort towards the evaluation of existing designs for self-modifying AGI, such as the GOLEM architecture[1], and accompanying “safe” goal systems implementing hybrid IA like I outlined above, or an AGI nanny architecture, etc.
EDIT: See Wai Dei’s post[2] for a similar argument.
EDIT2: If you want to down-vote, that’s fine. But please explain why in a reply.
https://www.youtube.com/watch?v=XDf4uT70W-U
http://lesswrong.com/lw/7n5/wanted_backup_plans_for_seed_ai_turns_out_to_be/
Lets suppose that Nanotechnology capable of recording and manipulating brains on a subneronal level exists, to such a level that duplicating people is straightforward. Lets also assume that everyone working on this project has the same goal function, and that they aren’t too intrinsically concerned about modifying themselves. The problem you are setting this AI is, given a full brain state, modify it to be much smarter but otherwise the same “person”. Same person implies both same goal function and same memories and same personality quirks. So it would be strictly easier to tell your AI to make a new “person” that has the same goals, and I don’t care if they have the same memories. Remove a few restrictions about making it psychologically humanoid, and you are asking it to solve friendly AI, that won’t be easy.
If there was a simple drug that made humans FAR smarter while leaving our goal functions intact, the AI could find that. However, given my understanding of the human mind, making large intelligence increases and mangling the goal function seems strictly easier than making large increases in intelligence while preserving the goal function. The latter would also seem to require a technical definition of the human goal function, a major component of friendly AI.