Wei Dai comments on Problems in AI Alignment that philosophers could potentially contribute to

Wei Dai 18 Aug 2019 17:35 UTC
LW: 25 AF: 5
AF

I think that definitions in terms of utility functions just aren’t very helpful

I agree this seems like a good candidate for our crux. It seems to me that defining “rational agent” in terms of “utility function” is both intuitively and theoretically quite appealing and really useful in practice (see the whole field of economics), and I’m pretty puzzled by your persistent belief that maybe we can do much better.

AFAICT, the main argument for your position is Coherent behaviour in the real world is an incoherent concept but I feel like I gave a strong counter-argument against it and I’m not sure what your counter-counter-argument is.

The recent paper Designing agent incentives to avoid reward tampering also seems relevant here, as it gives a seemingly clear explanation of why if you started with an RL agent, you might want to move to a decision theoretic agent (i.e., something that has a utility function) instead. I wonder if that changes your mind at all.