Edouard Harris comments on Re-Define Intent Alignment?

Edouard Harris 4 Aug 2021 19:00 UTC
LW: 1 AF: 1
AF
I’m not sure what would constitute a clearly-worked counterexample. To me, a high reliance on an agent/world boundary constitutes a “non-naturalistic” assumption, which simply makes me think a framework is more artificial/fragile.
Oh for sure. I wouldn’t recommend having a Cartesian boundary assumption as the fulcrum of your alignment strategy, for example. But what could be interesting would be to look at an isolated dynamical system, draw one boundary, investigate possible objective functions in the context of that boundary; then erase that first boundary, draw a second boundary, investigate that; etc. And then see whether any patterns emerge that might fit an intuitive notion of agency. But the only fundamentally real object here is always going to be the whole system, absolutely.
As I understand, something like AIXI forces you to draw one particular boundary because of the way the setting is constructed (infinite on one side, finite on the other). So I’d agree that sort of thing is more fragile.
The multiagent setting is interesting though, because it gets you into the game of carving up your universe into more than 2 pieces. Again it would be neat to investigate a setting like this with different choices of boundaries and see if some choices have more interesting properties than others.