Thanks for pointing that out! My goal is to highlight that there are at least 3 different sequencing factors necessary for deceptive alignment to emerge:
Goal directedness coming before an understanding of the base goal
Long-term goals coming before or around the same time as an understanding of the base goal
Situational awareness coming before or around the same time as an understanding of the base goal
The post you linked to talked about the importance of sequencing for #3, but it seems to assume that goal directedness will come first (#1) without discussion of sequencing. Long-term goals (#2) are described as happening as a result of an inductive bias toward deceptive alignment, and sequencing is not highlighted for that property. Please let me know if I missed anything in your post, and apologies in advance if that’s the case.
Do you agree that these three property development orders are necessary for deception?
Thanks for pointing that out! My goal is to highlight that there are at least 3 different sequencing factors necessary for deceptive alignment to emerge:
Goal directedness coming before an understanding of the base goal
Long-term goals coming before or around the same time as an understanding of the base goal
Situational awareness coming before or around the same time as an understanding of the base goal
The post you linked to talked about the importance of sequencing for #3, but it seems to assume that goal directedness will come first (#1) without discussion of sequencing. Long-term goals (#2) are described as happening as a result of an inductive bias toward deceptive alignment, and sequencing is not highlighted for that property. Please let me know if I missed anything in your post, and apologies in advance if that’s the case.
Do you agree that these three property development orders are necessary for deception?