Preferences and biases, the information argument

Stuart_Armstrong23 Mar 2021 12:44 UTC

LW: 14 AF: 8

I’ve recently thought of a possibly simpler way of expressing the argument from the Occam’s razor paper. Namely:

Human biases and human preferences contain more combined information than human behaviour does. And more than the full human policy does.

Thus, in order to deduce human biases and preferences, we need more information than the human policy caries.

This extra information is contained in the “normative assumptions”: the assumptions we need to add, so that an AI can learn human preferences from human behaviour.

We’d ideally want to do this with as few extra assumptions as possible. If the AI is well-grounded and understands what human concepts mean, we might be able to get away with a simple reference: “look through this collection of psychology research and take it as roughly true” could be enough assumptions to point the AI to all the assumptions it would need.

What links here?

Toy model of preference, bias, and extra information by Stuart_Armstrong (24 Mar 2021 10:14 UTC; 9 points)

Stuart_Armstrong23 Mar 2021 12:44 UTC

LW: 14 AF: 8

5 comments1 min readLW link

AI Rationality

Charlie Steiner 24 Mar 2021 4:56 UTC
LW: 2 AF: 1
0
AF
But is that true? Human behavior has a lot of information. We normally say that this extra information is irrelevant to the human’s beliefs and preferences (i.e. the agential model of humans is a simplification), but it’s still there.
- Stuart_Armstrong 24 Mar 2021 7:30 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Look at the paper linked for more details ( https://arxiv.org/abs/1712.05812 ).
  
  Basically “humans are always fully rational and always take the action they want to” is a full explanation of all of human behaviour, that is strictly simpler than any explanation which includes human biases and bounded rationality.
Shmi 23 Mar 2021 19:13 UTC
2 points
0
“look through this collection of psychology research and take it as roughly true”
Well, you are an intelligence that “is well-grounded and understands what human concepts mean”, do you think that the above approach would lead you to distill the right assumptions?
- Stuart_Armstrong 24 Mar 2021 7:31 UTC
  2 points
  0
  Parent
  No. But I expect that it would be much more in the right ballpark than other approaches, and I think it might be refined to be correct.
sxae 23 Mar 2021 15:31 UTC
1 point
0
I suppose the question is whether we can predict the “hidden inner mind” through some purely statistical model, as opposed to requiring some deeper understanding of human psychology of an AI. I’m not sure that a typical psychologist would claim to be able to predict behaviour through their training, whereas we have seen cases where even simple, statistical predictive systems can know more about you than you know about yourself - [1].
There’s also the idea that social intelligence is the ability to simulate other people, so perhaps that is something that an AI would need to do in order to understand other consciousnesses—running shallow simulations of those other minds.