I think the possibilities you mention are some of the many final alignments that an LLM agent could arrive at if it was allowed to reason and remember its conclusions.
I’ll address this more in an upcoming post, but in short, I think it’s really hard to predict, and it would be good to get a lot more brainpower on trying to work out the dynamics of belief/goal/value evolution.
I think the possibilities you mention are some of the many final alignments that an LLM agent could arrive at if it was allowed to reason and remember its conclusions.
I’ll address this more in an upcoming post, but in short, I think it’s really hard to predict, and it would be good to get a lot more brainpower on trying to work out the dynamics of belief/goal/value evolution.