This is a super interesting and important problem, IMO. I believe it already has significant real world practical consequences, e.g. powerful people find it difficult to avoid being surrounded by sychophants: even if they really don’t want to be, that’s just an extra constraint for the sychophants to satisfy (“don’t come across as sychophantic”)! I am inclined to agree that avoiding power differentials is the only way to really avoid these perverse outcomes in practice, and I think this is a good argument in favor of doing so.
-------------------------------------- This is also quite related to an (old, unpublished) work I did with Jonathan Binas on “bounded empowerment”. I’ve invited you to the Overleaf (it needs to clean-up, but I’ve also asked Jonathan about putting it on arXiv).
To summarize: Let’s consider this in the case of a superhuman AI, R, and a human H. The basic idea of that work is that R should try and “empower” H, and that (unlike in previous works on empowerment), there are two ways of doing this: 1) change the state of the world (as in previous works) 2) inform H so they know how to make use of the options available to them to achieve various ends (novel!)
If R has a perfect model of H and the world, then you can just compute how to effectively do these things (it’s wildly intractable, ofc). I think this would still often look “patronizing” in practice, and/or maybe just lead to totally wild behaviors (hard to predict this sort of stuff...), but it might be a useful conceptual “lead”.
Random thought OTMH: Something which might make it less “patronizing” is if H were to have well-defined “meta-preferences” about how such interactions should work that R could aim to respect.
This is a super interesting and important problem, IMO. I believe it already has significant real world practical consequences, e.g. powerful people find it difficult to avoid being surrounded by sychophants: even if they really don’t want to be, that’s just an extra constraint for the sychophants to satisfy (“don’t come across as sychophantic”)! I am inclined to agree that avoiding power differentials is the only way to really avoid these perverse outcomes in practice, and I think this is a good argument in favor of doing so.
--------------------------------------
This is also quite related to an (old, unpublished) work I did with Jonathan Binas on “bounded empowerment”. I’ve invited you to the Overleaf (it needs to clean-up, but I’ve also asked Jonathan about putting it on arXiv).
To summarize: Let’s consider this in the case of a superhuman AI, R, and a human H. The basic idea of that work is that R should try and “empower” H, and that (unlike in previous works on empowerment), there are two ways of doing this:
1) change the state of the world (as in previous works)
2) inform H so they know how to make use of the options available to them to achieve various ends (novel!)
If R has a perfect model of H and the world, then you can just compute how to effectively do these things (it’s wildly intractable, ofc). I think this would still often look “patronizing” in practice, and/or maybe just lead to totally wild behaviors (hard to predict this sort of stuff...), but it might be a useful conceptual “lead”.
Random thought OTMH: Something which might make it less “patronizing” is if H were to have well-defined “meta-preferences” about how such interactions should work that R could aim to respect.