Your mimic human ideas feels similar to various things I’ve been playing around with. Incidentally, I’ve radically simplified the original “mimic humans” idea (see the second Oracle design here https://agentfoundations.org/item?id=884 ). Instead of imitating humans, the AI selects from a list of human-supplied answers. This avoids any need for GANs or similar assessment methods ^_^ “Could a human have given this answer? Well, yes, because a human did.”
Selecting from a list of predetermined answers extremely limits the AI’s ability. Which isn’t good if we want it to actually solve very complex problems for us! And that method by itself doesn’t make the AI safe, just makes it much harder for it to do anything at all.
Note someone found a way to simplify my original idea in the comments. Instead of using the somewhat complicated GAN thing, you can just have it try to predict the next letter a human would type. In theory these methods are exactly equivalent.
Same as with the GAN thing. You condition it on producing a correct answer (or whatever the goal is.) So if you are building a question answering AI, you have it model the probability distribution something like P(human types this character | human correctly answers question). This could be done simply by only feeding it examples of correctly answered questions as it’s training set. Or you could have it predict what a human might respond if they had n days to think about it.
Though even that may not be necessary. What I had in mind was just having the AI read MIRI papers and produce new ones just like them. Like a superintelligent version of what people do today with markov chains or RNNs to produce writing in the style of an author.
Yes these methods do limit the AI’s ability a lot. It can’t do anything a human couldn’t do, in principle. But it can automate the work of humans and potentially do our job much faster. And if human ability isn’t enough to build an FAI, well you could always set it to do intelligence augmentation research instead.
I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that’s ok if the AI has two clearly defined processes: first find the top answer, independently of how it’s written up, then write up as a human. If those goals are mixed, it will go awry.
Your mimic human ideas feels similar to various things I’ve been playing around with. Incidentally, I’ve radically simplified the original “mimic humans” idea (see the second Oracle design here https://agentfoundations.org/item?id=884 ). Instead of imitating humans, the AI selects from a list of human-supplied answers. This avoids any need for GANs or similar assessment methods ^_^ “Could a human have given this answer? Well, yes, because a human did.”
Selecting from a list of predetermined answers extremely limits the AI’s ability. Which isn’t good if we want it to actually solve very complex problems for us! And that method by itself doesn’t make the AI safe, just makes it much harder for it to do anything at all.
Note someone found a way to simplify my original idea in the comments. Instead of using the somewhat complicated GAN thing, you can just have it try to predict the next letter a human would type. In theory these methods are exactly equivalent.
How do you trade that off against giving an actually useful answer?
Same as with the GAN thing. You condition it on producing a correct answer (or whatever the goal is.) So if you are building a question answering AI, you have it model the probability distribution something like P(human types this character | human correctly answers question). This could be done simply by only feeding it examples of correctly answered questions as it’s training set. Or you could have it predict what a human might respond if they had n days to think about it.
Though even that may not be necessary. What I had in mind was just having the AI read MIRI papers and produce new ones just like them. Like a superintelligent version of what people do today with markov chains or RNNs to produce writing in the style of an author.
Yes these methods do limit the AI’s ability a lot. It can’t do anything a human couldn’t do, in principle. But it can automate the work of humans and potentially do our job much faster. And if human ability isn’t enough to build an FAI, well you could always set it to do intelligence augmentation research instead.
I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that’s ok if the AI has two clearly defined processes: first find the top answer, independently of how it’s written up, then write up as a human. If those goals are mixed, it will go awry.