I think it might be better to have an explicit bar a potential solution must clear before it can be considered serious. Consider this quote from Sean Carroll about a potential new physical theory that would replace or subsume an existing one:
There are many phenomena that fit together; the expansion of the universe, the early universe and nucleosynthesis, the growth of structure in the universe, the fact that there are gravitational time delays as well as gravitational red shifts, there are many phenomena, all of which fit into the usual kind of picture, and if you say well, the usual kind of picture is not right, I have a new picture, you gotta re-explain all of those phenomena. Okay. So if someone says, “I have a theory about the universe that is new and exciting,” just say, “Alright, in your theory, what is the ratio of hydrogen to helium produced in the early universe?” And when they tell you that, ask them what the amount of deuterium and lithium is? Okay? Until you get to that level of detail, which is the standard model, explains very, very well, it’s not really worth taking alternative theories very seriously.
I suspect there can be a list of questions that each new alignment approach would have to answer, such as, idk, dealing with treacherous turns, out-of-distribution self-correction etc.
AI safety research has been groping in the dark, and half-baked suggestions for new research directions are valuable. It isn’t as though we’ve made half of a safe AI. We haven’t started, and all we have are ideas.
I think it might be better to have an explicit bar a potential solution must clear before it can be considered serious. Consider this quote from Sean Carroll about a potential new physical theory that would replace or subsume an existing one:
I suspect there can be a list of questions that each new alignment approach would have to answer, such as, idk, dealing with treacherous turns, out-of-distribution self-correction etc.
AI safety research has been groping in the dark, and half-baked suggestions for new research directions are valuable. It isn’t as though we’ve made half of a safe AI. We haven’t started, and all we have are ideas.