I think I mostly agree with this for current model organisms, but it seems plausible to me that well chosen studies conducted on future systems that are smarter in an agenty way, but not superintelligent, could yield useful insights that do generalise to superintelligent systems.
Not directly generalise mind you, but maybe you could get something like “Repeated intervention studies show that the formation of coherent self-protecting values in these AIs works roughly like y with properties b,c,d,e,f. Combined with other things we know, this maybe suggests that the general math for how training signals relate to values is a bit like z, and that suggests what we thought of as ‘values’ is a thing with type signature t.”
And then maybe type signature t is actually a useful building block for a framework which does generalise to superintelligence.
I am not particularly hopeful here. Even if we do get enough time to study agenty AIs that aren’t superintelligent, I have an intuition that this sort of science could turn out to be pretty intractable for reasons similar to why psychology turned out to be pretty intractable. I do think it might be worth a try though.
I think I mostly agree with this for current model organisms, but it seems plausible to me that well chosen studies conducted on future systems that are smarter in an agenty way, but not superintelligent, could yield useful insights that do generalise to superintelligent systems.
Not directly generalise mind you, but maybe you could get something like “Repeated intervention studies show that the formation of coherent self-protecting values in these AIs works roughly like y with properties b,c,d,e,f. Combined with other things we know, this maybe suggests that the general math for how training signals relate to values is a bit like z, and that suggests what we thought of as ‘values’ is a thing with type signature t.”
And then maybe type signature t is actually a useful building block for a framework which does generalise to superintelligence.
I am not particularly hopeful here. Even if we do get enough time to study agenty AIs that aren’t superintelligent, I have an intuition that this sort of science could turn out to be pretty intractable for reasons similar to why psychology turned out to be pretty intractable. I do think it might be worth a try though.