Jeremy Gillen comments on Am I confused about the “malign universal prior” argument?

Jeremy Gillen 30 Aug 2024 13:52 UTC
2 points
−1
Probably also worth stating that I don’t think the MUP is in any way relevant to real life.
I think it’s relevant because it illustrates an extreme variant of a very common problem, where “incorrectly specified” priors can cause unexpected behavior. It also illustrates the daemon problem, which I expect to be very relevant to real life.
A more realistic and straightforward example of the “incorrectly specified prior” problem: If the prior on an MCTS value head isn’t strong enough, it can overfit to value local instrumental goals too highly. Now your overall search process will only consider strategies that involve lots of this instrumental goal. So you end up with an agent that looks like it terminally values e.g. money, even though the goal in the “goal slot” is exactly correct and doesn’t include money.