I think it would need to be closer to “interacting with the LLM cannot result in exceptionally bad outcomes in expectation”, rather than a focus on compliance of text output.
I think a fairly common-here mental model of alignment requires context awareness, and by that definition an LLM with no attached memory couldn’t be aligned.
What is your definition of “Aligned” for an LLM with no attached memory then?
Wouldn’t it have to be
“The LLM outputs text which is compliant with the creator’s ethical standards and intentions”?
I think it would need to be closer to “interacting with the LLM cannot result in exceptionally bad outcomes in expectation”, rather than a focus on compliance of text output.
I think a fairly common-here mental model of alignment requires context awareness, and by that definition an LLM with no attached memory couldn’t be aligned.