EtA: I am still more concerned about “not enough samples to learn human preferences” than ELK or inner optimization type failures. This seems to be a fairly unpopular view, and I haven’t scrutinized it too much (but would be interested to discuss it cooperatively).
This is a crux for me, as it is why I don’t think slow takeoff is good by default. I think deceptive alignment is the default state barring interpretability efforts that are strong enough to actually detect mesa-optimizers or myopia. Yes, Foom is probably not going to happen, but in my view that doesn’t change much regarding risk in total.
TBC, “more concerned” doesn’t mean I’m not concerned about the other ones… and I just noticed that I make this mistake all the time when reading people say they are more concerned about present-day issues than x-risk....… hmmm........
This is a crux for me, as it is why I don’t think slow takeoff is good by default. I think deceptive alignment is the default state barring interpretability efforts that are strong enough to actually detect mesa-optimizers or myopia. Yes, Foom is probably not going to happen, but in my view that doesn’t change much regarding risk in total.
TBC, “more concerned” doesn’t mean I’m not concerned about the other ones… and I just noticed that I make this mistake all the time when reading people say they are more concerned about present-day issues than x-risk....… hmmm........