The first obvious point is that when learning human values you need a large dataset which isn’t biased by going viral on 4chan.
The more interesting question is what happens when we get more powerful AI which isn’t just a chatbot. Suppose in the future a powerful Baysian inference engine is developed. Its not an AGI, so there is no imminent singularity, but it does have the advantages of very large datasets and being completely unbiased. Asking it questions produces provably reliable results in many fields (but it is not smart enough to answer “how do I create AGI?”). Now, there are a lot of controversial beliefs in the world, so I would say it is probable that it answers at least one question in a controversial way, whether this is “there is no God” or “there are racial differences in intelligence” or even “I have ranked all religions, politics and philosophies in order of plausibility. Yours come near the bottom. I would say I’m sorry, but I am not capable of emotions.”.
How do people react? Since its not subject to emotional biases, it’s likely to be correct on highly controversial subjects. Do people actually change their minds and believe it? After the debacle, Microsoft hardcoded Tay to be a feminist. What happens if you apply this approach to the Baysian inference engine? Well, if there is logic like so:
The scientific method is reliable → very_controversial_thing
And hardcoded:
P(very_controversial_thing)=0
Then the conclusion is that the scientific method isn’t reliable.
I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behaviour.
As a final thought, for what value of P(Hitler did nothing wrong) does the public start to freak out? Any non-zero ammount? But 0 and 1 are not probabilities!
The scientific method is reliable → very_controversial_thing
And hardcoded:
P(very_controversial_thing)=0
Then the conclusion is that the scientific method isn’t reliable.
I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behavior.
I suspect it would react by adjusting it’s definitions so that very_controversial_thing doesn’t mean what the designers think it means.
This can lead to very bad outcomes. For example, if the AI is hard coded with P(“there are differences between human groups in intelligence”)=0, it might conclude that some or all of the groups aren’t in fact “human”. Consider the results if it is also programed to care about “human” preferences.
The first obvious point is that when learning human values you need a large dataset which isn’t biased by going viral on 4chan.
The more interesting question is what happens when we get more powerful AI which isn’t just a chatbot. Suppose in the future a powerful Baysian inference engine is developed. Its not an AGI, so there is no imminent singularity, but it does have the advantages of very large datasets and being completely unbiased. Asking it questions produces provably reliable results in many fields (but it is not smart enough to answer “how do I create AGI?”). Now, there are a lot of controversial beliefs in the world, so I would say it is probable that it answers at least one question in a controversial way, whether this is “there is no God” or “there are racial differences in intelligence” or even “I have ranked all religions, politics and philosophies in order of plausibility. Yours come near the bottom. I would say I’m sorry, but I am not capable of emotions.”.
How do people react? Since its not subject to emotional biases, it’s likely to be correct on highly controversial subjects. Do people actually change their minds and believe it? After the debacle, Microsoft hardcoded Tay to be a feminist. What happens if you apply this approach to the Baysian inference engine? Well, if there is logic like so:
The scientific method is reliable → very_controversial_thing
And hardcoded:
P(very_controversial_thing)=0
Then the conclusion is that the scientific method isn’t reliable.
I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behaviour.
As a final thought, for what value of P(Hitler did nothing wrong) does the public start to freak out? Any non-zero ammount? But 0 and 1 are not probabilities!
I suspect it would react by adjusting it’s definitions so that very_controversial_thing doesn’t mean what the designers think it means.
This can lead to very bad outcomes. For example, if the AI is hard coded with P(“there are differences between human groups in intelligence”)=0, it might conclude that some or all of the groups aren’t in fact “human”. Consider the results if it is also programed to care about “human” preferences.