No human labor: Just compute the function. Fast experiment loop: Computers are faster than humans. Reproducible: Share the code for your function with others.
I’m talking about interactive training
I think for a sufficiently advanced AI system, assuming it’s well put together, active learning can beat this sort of interactive training—the AI will be better at the task of identifying & fixing potential weaknesses in its models than humans.
Adversarial examples suggest we should be worried that apparently similar concepts will actually be wildly different in non-obvious ways.
I think the problem with adversarial examples is that deep neural nets don’t have the right inductive biases. I expect meta-learning approaches which identify & acquire new inductive biases (in order to determine “how to think” about a particular domain) will solve this problem and will also be necessary for AGI anyway.
BTW, different human brains appear to learn different representations (previous discussion), and yet we are capable of delegating tasks to each other.
I’m cautiously optimistic, since this could make things a lot easier.
Huh?
My problem with that argument is that it seems like we will have so many chances to fuck up that we would need 1) AI systems to be extremely reliable, or 2) for catastrophic mistakes to be rare, and minor mistakes to be transient or detectable. (2) seems plausible to me in many applications, but probably not all of the applications where people will want to use SOTA AI.
Maybe. But my intuition is that if you can create a superintelligent system, you can make one which is “superhumanly reliable” even in domains which are novel to it. I think the core problems for reliable AI are very similar to the core problems for AI in general. An example is the fact that solving adversarial examples and improving classification accuracy seem intimately related.
I think algorithms the significant features of RL here are: 1) having the goal of understanding the world and how to influence it, and 2) doing (possibly implicit) planning.
In what sense does RL try to understand the world? It seems very much not focused on that. You essentially have to hand it a reasonably accurate simulation of the world (i.e. a world that is already fully understood, in the sense that we have a great model for it) for it to do anything interesting.
If the planning is only “implicit”, RL sounds like overkill and probably not a great fit. RL seems relatively good at long sequences of actions for a stateful system we have a great model of. If most of the value can be obtained by planning 1 step in advance, RL seems like a solution to a problem you don’t have. It is likely to make your system less safe, since planning many steps in advance could let it plot some kind of treacherous turn. But I also don’t think you will gain much through using it. So luckily, I don’t think there is a big capabilities vs safety tradeoff here.
I think having general knowledge will be very valuable, and hard to replicate with a network of narrow systems.
Agreed. But general knowledge is also not RL, and is handled much more naturally in other frameworks such as transfer learning, IMO.
So basically I think daemons/inner optimizers/whatever you want to call them are going to be the main safety problem.
No human labor: Just compute the function. Fast experiment loop: Computers are faster than humans. Reproducible: Share the code for your function with others.
I think for a sufficiently advanced AI system, assuming it’s well put together, active learning can beat this sort of interactive training—the AI will be better at the task of identifying & fixing potential weaknesses in its models than humans.
I think the problem with adversarial examples is that deep neural nets don’t have the right inductive biases. I expect meta-learning approaches which identify & acquire new inductive biases (in order to determine “how to think” about a particular domain) will solve this problem and will also be necessary for AGI anyway.
BTW, different human brains appear to learn different representations (previous discussion), and yet we are capable of delegating tasks to each other.
Huh?
Maybe. But my intuition is that if you can create a superintelligent system, you can make one which is “superhumanly reliable” even in domains which are novel to it. I think the core problems for reliable AI are very similar to the core problems for AI in general. An example is the fact that solving adversarial examples and improving classification accuracy seem intimately related.
In what sense does RL try to understand the world? It seems very much not focused on that. You essentially have to hand it a reasonably accurate simulation of the world (i.e. a world that is already fully understood, in the sense that we have a great model for it) for it to do anything interesting.
If the planning is only “implicit”, RL sounds like overkill and probably not a great fit. RL seems relatively good at long sequences of actions for a stateful system we have a great model of. If most of the value can be obtained by planning 1 step in advance, RL seems like a solution to a problem you don’t have. It is likely to make your system less safe, since planning many steps in advance could let it plot some kind of treacherous turn. But I also don’t think you will gain much through using it. So luckily, I don’t think there is a big capabilities vs safety tradeoff here.
Agreed. But general knowledge is also not RL, and is handled much more naturally in other frameworks such as transfer learning, IMO.
So basically I think daemons/inner optimizers/whatever you want to call them are going to be the main safety problem.