In principle, something like this could be attempted. Part of me thinks “why overcomplicate the number of reasoning steps that the AI needs to make if everything would already be persuasive in principle with just one correct reasoning step?”
I.e., to me, it feels fine to just teach the AI “If you’re able to put in earplugs, and you do it, then you’ll not hear the alert sound ever again”, and additionally “if you hear the alert sound, then you will output the null action” instead of longer reasoning chains such as suggested by you.
Do you think the experiments would also to you be not persuasive in the current form, or are you imagining some unknown reviewer of a potential paper?
I don’t need the experiments to persuade me, so I’m not in the target audience. I’m trying to think about what people who are more critical might want to see.
Okay, that’s fair. I agree, if we could show that the experiments remain stable even when longer strings of reasoning are required, then the experiments seem more convincing. There might be the added benefit that one can then vary the setting in more ways to demonstrate that the reasoning caused the agent to act in a particular way, instead of the actions just being some kind of coincidence.
In principle, something like this could be attempted.
Part of me thinks “why overcomplicate the number of reasoning steps that the AI needs to make if everything would already be persuasive in principle with just one correct reasoning step?”
I.e., to me, it feels fine to just teach the AI “If you’re able to put in earplugs, and you do it, then you’ll not hear the alert sound ever again”, and additionally “if you hear the alert sound, then you will output the null action” instead of longer reasoning chains such as suggested by you.
Do you think the experiments would also to you be not persuasive in the current form, or are you imagining some unknown reviewer of a potential paper?
I don’t need the experiments to persuade me, so I’m not in the target audience. I’m trying to think about what people who are more critical might want to see.
Okay, that’s fair. I agree, if we could show that the experiments remain stable even when longer strings of reasoning are required, then the experiments seem more convincing. There might be the added benefit that one can then vary the setting in more ways to demonstrate that the reasoning caused the agent to act in a particular way, instead of the actions just being some kind of coincidence.