I’d like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that’s difficult to formalise (e.g. somewhere within a neural network).
It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection—hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you’re a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.
I’d like to push back on the assumption that AIs will have explicit utility functions.
Yeah I was expecting this, and don’t want to rely too heavily on such an assumption, which is why I used “for example” everywhere. :)
Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that’s difficult to formalise (e.g. somewhere within a neural network).
I think they don’t necessarily need to formalize their utility functions, just isolate the parts of their neural networks that encode those functions. And then you could probably take two such neural networks and optimize for a weighted average of the outputs of the two utility functions. (Although for bargaining purposes, to determine the weights, they probably do need to look inside the neural networks somehow and can’t just treat them as black boxes.)
Or are you’re thinking that the parts encoding the utility function are so intertwined with the rest of the AI that they can’t be separated out, and the difficulty of doing that increases with the intelligence of the AI so that the AI remains unable to isolate its own utility function as it gets smarter? If so, it’s not clear to me why there wouldn’t be AIs with cleanly separable utility functions that are nearly as intelligent, which would outcompete AIs with non-separable utility functions because they can merge with each other and obtain the benefits of better coordination.
It may also be the case that coordination is much harder for AIs than for humans.
I’d like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that’s difficult to formalise (e.g. somewhere within a neural network).
It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection—hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you’re a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.
Yeah I was expecting this, and don’t want to rely too heavily on such an assumption, which is why I used “for example” everywhere. :)
I think they don’t necessarily need to formalize their utility functions, just isolate the parts of their neural networks that encode those functions. And then you could probably take two such neural networks and optimize for a weighted average of the outputs of the two utility functions. (Although for bargaining purposes, to determine the weights, they probably do need to look inside the neural networks somehow and can’t just treat them as black boxes.)
Or are you’re thinking that the parts encoding the utility function are so intertwined with the rest of the AI that they can’t be separated out, and the difficulty of doing that increases with the intelligence of the AI so that the AI remains unable to isolate its own utility function as it gets smarter? If so, it’s not clear to me why there wouldn’t be AIs with cleanly separable utility functions that are nearly as intelligent, which would outcompete AIs with non-separable utility functions because they can merge with each other and obtain the benefits of better coordination.
My reply to Robin Hanson seems to apply here. Did you see that?