Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time.
This view is probably completely mistaken, for two separate reasons:
we can test AI architectures at different levels of scaling. A human brain is just a scaled up primate brain, which suggests that all the important features of how value acquisition works, empathy, altruism, value alignment, whatever—all of those features can be tested first in AGI that is near human level.
We already have encountered numerous large-scale ‘one-shot’ engineering challenges, and there is already an extremely effective general solution. If you have a problem that you have to get right the first try, you change this into an iterative problem by creating a simulation framework. Doing that for AGI may involve creating the Matrix, more or less, but that isn’t necessarily anymore complex than creating AGI in the first place.
To clarify, I was not critiquing the idea that we need to get “superintelligence unleashed on the world” correct the first try—that of course I do agree with. I was critiquing the more specific idea that we need to get AGI morality/safety correct the first try.
One could compare to ICBM missile defense systems. The US (and other nations) have developed that tech, and i’ts a case where you have to get the deployed product “right the first try”. However you can’t test it in the real world, but you absolutely can do iterative development in simulation, and this really is the only sensible way to develop such tech. Formal verification is about as useful for AGI safety as it is for testing ICBM defense—not much use at all.
I’m not sure how much we are disagreeing here. I’m not proposing anything like formal verification. I think development in simulation is likely to be an important tool in getting it right the first time you go “live”, but I also think there may be other useful general techniques/tools, and that it could be worth investigating them well in advance of need.
Agreed. In particular I think IRL (Inverse Reinforcement Learning) is likely to turn out to be very important. Also, it is likely that the brain has some clever mechanisms for things like value acquisition or IRL, as well as empathy/altruism, and figuring out those mechanisms could be useful.
This view is probably completely mistaken, for two separate reasons:
we can test AI architectures at different levels of scaling. A human brain is just a scaled up primate brain, which suggests that all the important features of how value acquisition works, empathy, altruism, value alignment, whatever—all of those features can be tested first in AGI that is near human level.
We already have encountered numerous large-scale ‘one-shot’ engineering challenges, and there is already an extremely effective general solution. If you have a problem that you have to get right the first try, you change this into an iterative problem by creating a simulation framework. Doing that for AGI may involve creating the Matrix, more or less, but that isn’t necessarily anymore complex than creating AGI in the first place.
To me these look like (pretty good) strategies for getting something right the first time, not in opposition to the idea that this would be needed.
They do suggest that an environment which is richer than just “submit perfect code without testing” might be a better training ground.
To clarify, I was not critiquing the idea that we need to get “superintelligence unleashed on the world” correct the first try—that of course I do agree with. I was critiquing the more specific idea that we need to get AGI morality/safety correct the first try.
One could compare to ICBM missile defense systems. The US (and other nations) have developed that tech, and i’ts a case where you have to get the deployed product “right the first try”. However you can’t test it in the real world, but you absolutely can do iterative development in simulation, and this really is the only sensible way to develop such tech. Formal verification is about as useful for AGI safety as it is for testing ICBM defense—not much use at all.
I’m not sure how much we are disagreeing here. I’m not proposing anything like formal verification. I think development in simulation is likely to be an important tool in getting it right the first time you go “live”, but I also think there may be other useful general techniques/tools, and that it could be worth investigating them well in advance of need.
Agreed. In particular I think IRL (Inverse Reinforcement Learning) is likely to turn out to be very important. Also, it is likely that the brain has some clever mechanisms for things like value acquisition or IRL, as well as empathy/altruism, and figuring out those mechanisms could be useful.