I guess it’s weird (counterintuitive and hard to think about) compared to “The imitation is modeling the human trying to write a good program.” which is what I initially thought the situation would be. In that case, the human doesn’t have to think about the imitation and can just think about how to write a good program. The situation with HSIFAUH seems a lot more complicated. Thinking about it more...
In the limit of perfect imitation, “the imitation is modeling the human trying to write a good program” converges to “the human trying to write a good program.” In the limit of perfect imitation, HSIFAUH converges to “a human trying to write a good program while suffering amnesia between time steps (but can review previous actions and write down notes).” Correct? HSIFAUH could keep memories between time steps, but won’t, because it’s modeling a human who wouldn’t have such memories. (I think I was confused in part because you said that performance wouldn’t be affected. It now seems to me that performance would be affected because a human who can’t keep memories but can only keep notes can’t program as well as a normal human.)
(Thinking about imperfect imitation seems even harder and I’ll try that more after you confirm the above.)
One thing still confuses me. Whenever the real human does get called in to provide training data, the real human now has that memory. But the (most probable) models don’t know that, so the predictions for the next round are going to be wrong (compared to what the real human would do if called in) because it’s going to be based on the real human not having that memory. (I think this is what I meant when I said “it seems like the human imitations will keep diverging from real humans quickly”.) The Bayesian update wouldn’t cause the models to know that the real human now has that memory, because suppose the real human does something the top models correctly predicted, then the update wouldn’t do much. So how does this problem get solved, or am I misunderstanding something here? (Maybe we can just provide an input to the models that indicates whether the real human was called in for the last time step?)
Correct. I’ll just add that a single action can be a large chunk of the program. It doesn’t have to be (god forbid) character by character.
But the (most probable) models don’t know that, so the predictions for the next round are going to be wrong (compared to what the real human would do if called in) because it’s going to be based on the real human not having that memory.
It’ll have some probability distribution over the contents of the humans’ memories. This will depend on which timesteps they actually participated in, so it’ll have a probability distribution over that. I don’t think that’s really a problem though. If humans are taking over one time in a thousand, then it’ll think (more or less) there’s a 1⁄000 chance that they’ll remember the last action. (Actually, it can do better by learning that humans take over in confusing situations, but that’s not really relevant here).
Maybe we can just provide an input to the models that indicates whether the real human was called in for the last time step?
That would work too. With the edit that the model may as well be allowed to depend on the whole history of which actions were human-selected, not just whether the last one was.
Actually before we keep going with our discussions, it seems to make sense to double check that your proposal is actually the most promising proposal (for human imitation) to discuss. Can you please take a look at the list of 10 links related to human imitations that I collected (as well as any relevant articles those pages further link to), and perhaps write a post on why your proposal is better than the previous ones, why you made the design choices that you did, and how it addresses or avoids the existing criticisms of human imitations? ETA: I’m also happy to discuss with you your views of past proposals/criticisms here in the comments or through another channel if you prefer to do that before writing up a post.
If humans are taking over one time in a thousand, then it’ll think (more or less) there’s a 1⁄000 chance that they’ll remember the last action.
But there’s a model/TM that thinks there a 100% chance that the human will remember the last action (because that’s hard coded into the TM) and that model will do really well in the next update. So we know any time a human steps in no matter when, it will cause a big update (during the next update) because it’ll raise models like this from obscurity to prominence. If the AI “knows” this, it will call in the human for every time step, but maybe it doesn’t “know” this? (I haven’t thought this through formally and will leave it to you.)
With the edit that the model may as well be allowed to depend on the whole history of which actions were human-selected, not just whether the last one was.
I was assuming the models would save that input on its work tape for future use.
In any case, I think I understand your proposal well enough now that we can go back to some of the other questions.
I guess it’s weird (counterintuitive and hard to think about) compared to “The imitation is modeling the human trying to write a good program.” which is what I initially thought the situation would be. In that case, the human doesn’t have to think about the imitation and can just think about how to write a good program. The situation with HSIFAUH seems a lot more complicated. Thinking about it more...
In the limit of perfect imitation, “the imitation is modeling the human trying to write a good program” converges to “the human trying to write a good program.” In the limit of perfect imitation, HSIFAUH converges to “a human trying to write a good program while suffering amnesia between time steps (but can review previous actions and write down notes).” Correct? HSIFAUH could keep memories between time steps, but won’t, because it’s modeling a human who wouldn’t have such memories. (I think I was confused in part because you said that performance wouldn’t be affected. It now seems to me that performance would be affected because a human who can’t keep memories but can only keep notes can’t program as well as a normal human.)
(Thinking about imperfect imitation seems even harder and I’ll try that more after you confirm the above.)
One thing still confuses me. Whenever the real human does get called in to provide training data, the real human now has that memory. But the (most probable) models don’t know that, so the predictions for the next round are going to be wrong (compared to what the real human would do if called in) because it’s going to be based on the real human not having that memory. (I think this is what I meant when I said “it seems like the human imitations will keep diverging from real humans quickly”.) The Bayesian update wouldn’t cause the models to know that the real human now has that memory, because suppose the real human does something the top models correctly predicted, then the update wouldn’t do much. So how does this problem get solved, or am I misunderstanding something here? (Maybe we can just provide an input to the models that indicates whether the real human was called in for the last time step?)
Correct. I’ll just add that a single action can be a large chunk of the program. It doesn’t have to be (god forbid) character by character.
It’ll have some probability distribution over the contents of the humans’ memories. This will depend on which timesteps they actually participated in, so it’ll have a probability distribution over that. I don’t think that’s really a problem though. If humans are taking over one time in a thousand, then it’ll think (more or less) there’s a 1⁄000 chance that they’ll remember the last action. (Actually, it can do better by learning that humans take over in confusing situations, but that’s not really relevant here).
That would work too. With the edit that the model may as well be allowed to depend on the whole history of which actions were human-selected, not just whether the last one was.
Actually before we keep going with our discussions, it seems to make sense to double check that your proposal is actually the most promising proposal (for human imitation) to discuss. Can you please take a look at the list of 10 links related to human imitations that I collected (as well as any relevant articles those pages further link to), and perhaps write a post on why your proposal is better than the previous ones, why you made the design choices that you did, and how it addresses or avoids the existing criticisms of human imitations? ETA: I’m also happy to discuss with you your views of past proposals/criticisms here in the comments or through another channel if you prefer to do that before writing up a post.
Sorry to put this on hold, but I’ll come back to this conversation after the AAAI deadline on September 5.
Commenting here.
But there’s a model/TM that thinks there a 100% chance that the human will remember the last action (because that’s hard coded into the TM) and that model will do really well in the next update. So we know any time a human steps in no matter when, it will cause a big update (during the next update) because it’ll raise models like this from obscurity to prominence. If the AI “knows” this, it will call in the human for every time step, but maybe it doesn’t “know” this? (I haven’t thought this through formally and will leave it to you.)
I was assuming the models would save that input on its work tape for future use.
In any case, I think I understand your proposal well enough now that we can go back to some of the other questions.