I think Noosphere89 meant to say “when deceptive alignment doesn’t happen” in that sentence. (They can correct me if I’m wrong.)
Anyway, I think I’m in agreement with Noosphere89 that (1) it’s eminently reasonable to try to figure out whether or not deceptive alignment will happen (in such-and-such AI architecture and training approach), and (2) it’s eminently reasonable to have significantly different levels of overall optimism or pessimism about AI takeover depending on the answer to question (1). I hope this post does not give anyone an impression contrary to that.
What do you mean by “when deceptive alignment fails to work”? I’m confused.
I think Noosphere89 meant to say “when deceptive alignment doesn’t happen” in that sentence. (They can correct me if I’m wrong.)
Anyway, I think I’m in agreement with Noosphere89 that (1) it’s eminently reasonable to try to figure out whether or not deceptive alignment will happen (in such-and-such AI architecture and training approach), and (2) it’s eminently reasonable to have significantly different levels of overall optimism or pessimism about AI takeover depending on the answer to question (1). I hope this post does not give anyone an impression contrary to that.
Yep, that’s what I was talking about, Seth Herd.