The issue is that it’s very difficult to reason correctly in the absence of an “Official Experiment”[1]. I think the alignment community is too quick to dismiss potentially useful ideas, and that the reasons for those dismissals are often wrong. E.g., I still don’t think anyone’s given a clear, mechanistic reason for why rewarding an RL agent for making you smile is bound to fail (as opposed to being a terrible idea that probably fails).
More precisely, it’s very difficult to reason correctly even with many “Official Experiments”, and nearly impossible to do so without any such experiments.
It’s a preparadigmatic field. Nobody is going to prove beyond a shadow of a doubt that X fails, for exactly the same reasons that nobody is going to prove beyond a shadow of a doubt that X works. And that just doesn’t matter very much, for decision-making purposes. If something looks unlikely to work, then the EV-maximizing move is to dismiss it and move on. Maybe one or two people work on the thing-which-is-unlikely-to-work in order to decorrelate their bets with everyone else, but mostly people should ignore things which are unlikely to work, especially if there’s already one or two people looking closer at it.
The issue is that it’s very difficult to reason correctly in the absence of an “Official Experiment”[1]. I think the alignment community is too quick to dismiss potentially useful ideas, and that the reasons for those dismissals are often wrong. E.g., I still don’t think anyone’s given a clear, mechanistic reason for why rewarding an RL agent for making you smile is bound to fail (as opposed to being a terrible idea that probably fails).
More precisely, it’s very difficult to reason correctly even with many “Official Experiments”, and nearly impossible to do so without any such experiments.
It’s a preparadigmatic field. Nobody is going to prove beyond a shadow of a doubt that X fails, for exactly the same reasons that nobody is going to prove beyond a shadow of a doubt that X works. And that just doesn’t matter very much, for decision-making purposes. If something looks unlikely to work, then the EV-maximizing move is to dismiss it and move on. Maybe one or two people work on the thing-which-is-unlikely-to-work in order to decorrelate their bets with everyone else, but mostly people should ignore things which are unlikely to work, especially if there’s already one or two people looking closer at it.