Same as with the GAN thing. You condition it on producing a correct answer (or whatever the goal is.) So if you are building a question answering AI, you have it model the probability distribution something like P(human types this character | human correctly answers question). This could be done simply by only feeding it examples of correctly answered questions as it’s training set. Or you could have it predict what a human might respond if they had n days to think about it.
Though even that may not be necessary. What I had in mind was just having the AI read MIRI papers and produce new ones just like them. Like a superintelligent version of what people do today with markov chains or RNNs to produce writing in the style of an author.
Yes these methods do limit the AI’s ability a lot. It can’t do anything a human couldn’t do, in principle. But it can automate the work of humans and potentially do our job much faster. And if human ability isn’t enough to build an FAI, well you could always set it to do intelligence augmentation research instead.
I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that’s ok if the AI has two clearly defined processes: first find the top answer, independently of how it’s written up, then write up as a human. If those goals are mixed, it will go awry.
Same as with the GAN thing. You condition it on producing a correct answer (or whatever the goal is.) So if you are building a question answering AI, you have it model the probability distribution something like P(human types this character | human correctly answers question). This could be done simply by only feeding it examples of correctly answered questions as it’s training set. Or you could have it predict what a human might respond if they had n days to think about it.
Though even that may not be necessary. What I had in mind was just having the AI read MIRI papers and produce new ones just like them. Like a superintelligent version of what people do today with markov chains or RNNs to produce writing in the style of an author.
Yes these methods do limit the AI’s ability a lot. It can’t do anything a human couldn’t do, in principle. But it can automate the work of humans and potentially do our job much faster. And if human ability isn’t enough to build an FAI, well you could always set it to do intelligence augmentation research instead.
I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that’s ok if the AI has two clearly defined processes: first find the top answer, independently of how it’s written up, then write up as a human. If those goals are mixed, it will go awry.