I think the Armstrong and Minderman NFL result is very weak. Obviously inferring values requires assuming that the planning algorithm is trying to maximize those values in some sense, and they don’t have such an assumption. IMO my AIT definition of intelligence shows a clear path to solving this. That said I’m not at all sure this is enough to get alignment without full access to the human policy.
I think the Armstrong and Minderman NFL result is very weak. Obviously inferring values requires assuming that the planning algorithm is trying to maximize those values in some sense, and they don’t have such an assumption. IMO my AIT definition of intelligence shows a clear path to solving this. That said I’m not at all sure this is enough to get alignment without full access to the human policy.