In the limit (what might be considered the ‘best imaginable case’), we might imagine researchers discovering an alignment technique that (A) was guaranteed to eliminate x-risk and (B) improve capabilities so clearly that they become competitively necessary for anyone attempting to build AGI.
I feel like throughout this post, you are ignoring that agents, “in the limit”, are (likely) provably taxed by having to be aligned to goals other than their own. An agent with utility function “A” is definitely going to be less capable at achieving “A” if it is also aligned to utility function “B”. I respect that current LLM’s not best described as having a singular consistent goal function, however, “in the limit” that is what they will be best described as.
I think this is precisely the reason that you’d want to make sure the agent is engineered such that its utility function includes the utility of other agents—ie, so that the ‘alignment goals’ are its goals rather than ‘goals other than [its] own.’ We suspect that this exact sort of architecture could actually exhibit a negative alignment tax insofar as many other critical social competencies may require this as a foundation.
I feel like throughout this post, you are ignoring that agents, “in the limit”, are (likely) provably taxed by having to be aligned to goals other than their own. An agent with utility function “A” is definitely going to be less capable at achieving “A” if it is also aligned to utility function “B”. I respect that current LLM’s not best described as having a singular consistent goal function, however, “in the limit” that is what they will be best described as.
I think this is precisely the reason that you’d want to make sure the agent is engineered such that its utility function includes the utility of other agents—ie, so that the ‘alignment goals’ are its goals rather than ‘goals other than [its] own.’ We suspect that this exact sort of architecture could actually exhibit a negative alignment tax insofar as many other critical social competencies may require this as a foundation.