Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don’t get stuck in local optima as much as current RL algorithms.
AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.
Current reinforcement learning algorithms aren’t very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit “break out the boxish”) In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.
I agree that most of the really dangerous break out the boxes probably can’t be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)
I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.
I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.
What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can’t protect us from making an unfriendly AGI.
I’m also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.
AIXI would be very good at making complex plans and doing well first time.
Agreed, I claim we have no clue at how to make anything remotely like AIXI in the real world.
Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully.
Agreed, in a CAIS world, the system of interacting services would probably notice the plan but not execute it because of some service that is meant to prevent it from doing crazy things that humans would not want.
What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can’t protect us from making an unfriendly AGI.
This definitely seems like the crux for many people. I’m quite unsure about this point; it seems plausible to me that CAIS could in fact do most things such that there aren’t very large incentives, especially if the Factored Cognition hypothesis is true.
I’m also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms.
I don’t see why it would have to be little tweaks to existing algorithms, it seems plausible to have the R&D services consider entirely new algorithms as well.
Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don’t get stuck in local optima as much as current RL algorithms.
AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.
Current reinforcement learning algorithms aren’t very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit “break out the boxish”) In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.
I agree that most of the really dangerous break out the boxes probably can’t be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)
I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.
I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.
What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can’t protect us from making an unfriendly AGI.
I’m also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.
Agreed, I claim we have no clue at how to make anything remotely like AIXI in the real world.
Agreed, in a CAIS world, the system of interacting services would probably notice the plan but not execute it because of some service that is meant to prevent it from doing crazy things that humans would not want.
This definitely seems like the crux for many people. I’m quite unsure about this point; it seems plausible to me that CAIS could in fact do most things such that there aren’t very large incentives, especially if the Factored Cognition hypothesis is true.
I don’t see why it would have to be little tweaks to existing algorithms, it seems plausible to have the R&D services consider entirely new algorithms as well.