Ignore all the stuff about provably friendly AI, because AFAIK its fairly stuck at the fundamental level of theoretical impossibility due to lob’s theorem and its prob going to take a lot more than five years. Instead, work on cruder methods which have less chance of working but far more chance of actually being developed in time. Specifically, if Google are developing it in 5 years, then its probably going to be deepmind with DNNs and RL, so work on methods that can fit in with that approach.
I agree. I think it’s very unlikely FAI could be produced from MIRI’s very abstract approach. At least anytime soon.
There are some methods that may work on NN based approaches. For instance my idea for an AI that pretends to be human. In general, you can make AIs that do not have long-term goals, only short term ones. Or even AIs that don’t have goals at all and just make predictions. E.g., predicting what a human would do. The point is to avoid making them agents that maximize values in the real world.
These ideas don’t solve FAI on their own. But they do give a way of getting useful work out of even very powerful AIs. You could task them with coming up with FAI ideas. The AIs could write research papers, review papers, prove theorems, write and review code, etc.
I also think it’s possible that RL isn’t that dangerous. Reinforcement learners can’t model death and don’t care about self-preservation. They may try to hijack their own reward signal, but it’s difficult to understand what they would do after that. E.g. if they just tweak their own RAM to have reward = +Inf, and then not do anything else. It may be harder to create a working paperclip maximizer than is commonly believed, even if we do get superintelligent AI.
I agree. FAI somehow should use human upload or human-like architecture for its value core. In this case values will be presented in it in complex and non-ortogonal ways, and at least one human-like creature will survive.
Yes. I think that we need not only workable solution, but also implementable. If someone create 800 pages pdf starting with new set theory, solution of Lob theorem problem etc and come to Google with it and say: “Hi, please, switch off all you have and implement this”—it will not work.
But MIRI added in 2016 the line of research for machine learning.
Ignore all the stuff about provably friendly AI, because AFAIK its fairly stuck at the fundamental level of theoretical impossibility due to lob’s theorem and its prob going to take a lot more than five years. Instead, work on cruder methods which have less chance of working but far more chance of actually being developed in time. Specifically, if Google are developing it in 5 years, then its probably going to be deepmind with DNNs and RL, so work on methods that can fit in with that approach.
I agree. I think it’s very unlikely FAI could be produced from MIRI’s very abstract approach. At least anytime soon.
There are some methods that may work on NN based approaches. For instance my idea for an AI that pretends to be human. In general, you can make AIs that do not have long-term goals, only short term ones. Or even AIs that don’t have goals at all and just make predictions. E.g., predicting what a human would do. The point is to avoid making them agents that maximize values in the real world.
These ideas don’t solve FAI on their own. But they do give a way of getting useful work out of even very powerful AIs. You could task them with coming up with FAI ideas. The AIs could write research papers, review papers, prove theorems, write and review code, etc.
I also think it’s possible that RL isn’t that dangerous. Reinforcement learners can’t model death and don’t care about self-preservation. They may try to hijack their own reward signal, but it’s difficult to understand what they would do after that. E.g. if they just tweak their own RAM to have reward = +Inf, and then not do anything else. It may be harder to create a working paperclip maximizer than is commonly believed, even if we do get superintelligent AI.
I agree. FAI somehow should use human upload or human-like architecture for its value core. In this case values will be presented in it in complex and non-ortogonal ways, and at least one human-like creature will survive.
Yes. I think that we need not only workable solution, but also implementable. If someone create 800 pages pdf starting with new set theory, solution of Lob theorem problem etc and come to Google with it and say: “Hi, please, switch off all you have and implement this”—it will not work.
But MIRI added in 2016 the line of research for machine learning.