I think the basic idea of instrumental convergence is just really blindingly obvious, and I think it is very annoying that there are people who will cluck their tongues and stroke their beards and say “Hmm, instrumental convergence you say? I won’t believe it unless it is in a very prestigious journal with academic affiliations at the top and Computer Modern font and an impressive-looking methods section.”
I am happy that your papers exist to throw at such people.
Anyway, if optimal policies tend to seek power, then I desire to believe that optimal policies tend to seek power :) :) And if optimal policies aren’t too relevant to the alignment problem, well neither are 99.99999% of papers, but it would be pretty silly to retract all of those :)
I think the basic idea of instrumental convergence is just really blindingly obvious, and I think it is very annoying that there are people who will cluck their tongues and stroke their beards and say “Hmm, instrumental convergence you say? I won’t believe it unless it is in a very prestigious journal with academic affiliations at the top and Computer Modern font and an impressive-looking methods section.”
I am happy that your papers exist to throw at such people.
Anyway, if optimal policies tend to seek power, then I desire to believe that optimal policies tend to seek power :) :) And if optimal policies aren’t too relevant to the alignment problem, well neither are 99.99999% of papers, but it would be pretty silly to retract all of those :)