Rohin Shah comments on Alignment Newsletter #48

Rohin Shah 12 Mar 2019 16:49 UTC
LW: 4 AF: 2
AF
Question about quantilization: where does the base distribution come from? You and Jessica both mention humans, but if we apply ML to humans, and the ML is really good, wouldn’t it just give a prediction like “With near certainty, the human will output X in this situation”? (If the ML isn’t very good, then any deviation from the above prediction would reflect the properties of the ML algorithm more than properties of the human.)
I don’t have a great answer to this. Intuitively, at the high level, there are a lot of different plans I “could have” taken, and the fact that I didn’t take them is more a result of what I happened to think about rather than a considered decision that they were bad. So in the limit of really good ML, one thing you could do is to have a distribution over “initial states” and then ask for the induced distribution over human actions. For example, if you’re predicting the human’s choice from a list of actions, then you could make the prediction depending on different orderings of the choices in the list, and different presentations of the list. If you’re predicting what the human will do in some physical environment, you could check to see what would be done if the human felt slightly colder or slightly warmer, or if they had just thought of particular random words or sentences, etc. All of these have issues in the worst case (e.g. if you’re making a decision about whether to wear a jacket, slightly changing the temperature of the room will make the decision worse), but seem fine in most cases, suggesting that there could be a way of making this work, especially if you can do it differently for different domains.