I like that these ideas can be turned into new learning paradigms relatively easily.
I think there’s obviously something like your proposal going on, but I feel like it’s the wrong place to start.
It’s important that the system realize it has to model human communication as an attempt to communicate something, which is what you’re doing here. It’s something utterly missing from my model as written.
However, I feel like starting from this point forces us to hard-code a particular model of communication, which means the system can never get beyond this. As you said:
First, obviously, humans are not perfectly rational and logically omniscient, so we have to replace “X maximizes P[Y|M]” with “<rough model of human> thinks X will produce high P[Y|M]”. The better the human-model, the broader the basin of attraction for the whole thing to work.
I would rather attack the problem of specifying what it could mean for a system to learn at all the meta levels in the first place, and then teach such a system about this kind of communication model as part of its broader education about how to avoid things like wireheading, human manipulation, treacherous turns, and so on.
Granted, you could overcome the hardwired-ness of the communication model if your “treat the source code as a communication, too” idea ended up allowing a reinterpretation of the basic communication model. That just seems very difficult.
All this being said, I’m glad to hear you were working on something similar. Your idea obviously starts to get at the “interpretable feedback” idea which I basically failed to make progress on in my proposal.
Yeah, I largely agree with this critique. The strategy relies heavily on the AI being able to move beyond the initial communication model, and we have essentially no theory to back that up.
I like that these ideas can be turned into new learning paradigms relatively easily.
I think there’s obviously something like your proposal going on, but I feel like it’s the wrong place to start.
It’s important that the system realize it has to model human communication as an attempt to communicate something, which is what you’re doing here. It’s something utterly missing from my model as written.
However, I feel like starting from this point forces us to hard-code a particular model of communication, which means the system can never get beyond this. As you said:
I would rather attack the problem of specifying what it could mean for a system to learn at all the meta levels in the first place, and then teach such a system about this kind of communication model as part of its broader education about how to avoid things like wireheading, human manipulation, treacherous turns, and so on.
Granted, you could overcome the hardwired-ness of the communication model if your “treat the source code as a communication, too” idea ended up allowing a reinterpretation of the basic communication model. That just seems very difficult.
All this being said, I’m glad to hear you were working on something similar. Your idea obviously starts to get at the “interpretable feedback” idea which I basically failed to make progress on in my proposal.
Yeah, I largely agree with this critique. The strategy relies heavily on the AI being able to move beyond the initial communication model, and we have essentially no theory to back that up.
Still interested in your write-up, though!
It’s up.