After reading Eliezerâs list of lethalities, I have doubts (hopes?) that some of the challenges he mentions will occur.
Letâs start with inner alignment. Letâs think step by step. đ
Inner alignment is a new name for a long-known challenge of many systems. Whether itâs called the agency problem or delegation challenges, giving a task to another entity and then making sure that entity not only does what you want it to do but in a way that you approve of is something people and systems have been dealing with since the first tribes. It is not an emergent property of AGI that will need to be navigated from a blank slate.
Humans and AGI are aligned on the need to manage inner alignment. While deception by the mesa-optimizer (âagentâ) must be addressed, both humans and the AGI agree that agents going rogue to take actions that fulfill their sub-goal but thwart the overall mission must be prevented.
The AGI will be much more powerful than the agents. An agent will logically have fewer resources at its disposal than the overall system, and to provide the benefit of leverage, the number of agents should be significant. If there are a small number of agents, then their work can be subsumed by the overall system instead of creating agents which incur alignment challenges. Since there will be a large number of agents, each agent will have only a fraction of the overall systemâs power, which implies the system should have considerable resources available to monitor and correct deviations from the systemâs mission.
An AGI that doesnât solve inner alignment, with or without human help, isnât going to make it to super intelligence (SI). An SI will be able to get things done as planned and intended (at least according to the SIâs understandingânot addressing outer alignment here). If it canât stop its own agents from doing things it agrees are not the mission, itâs not an SI.
Doom doubtsâis inner alignment a likely problem?
After reading Eliezerâs list of lethalities, I have doubts (hopes?) that some of the challenges he mentions will occur.
Letâs start with inner alignment. Letâs think step by step. đ
Inner alignment is a new name for a long-known challenge of many systems. Whether itâs called the agency problem or delegation challenges, giving a task to another entity and then making sure that entity not only does what you want it to do but in a way that you approve of is something people and systems have been dealing with since the first tribes. It is not an emergent property of AGI that will need to be navigated from a blank slate.
Humans and AGI are aligned on the need to manage inner alignment. While deception by the mesa-optimizer (âagentâ) must be addressed, both humans and the AGI agree that agents going rogue to take actions that fulfill their sub-goal but thwart the overall mission must be prevented.
The AGI will be much more powerful than the agents. An agent will logically have fewer resources at its disposal than the overall system, and to provide the benefit of leverage, the number of agents should be significant. If there are a small number of agents, then their work can be subsumed by the overall system instead of creating agents which incur alignment challenges. Since there will be a large number of agents, each agent will have only a fraction of the overall systemâs power, which implies the system should have considerable resources available to monitor and correct deviations from the systemâs mission.
An AGI that doesnât solve inner alignment, with or without human help, isnât going to make it to super intelligence (SI). An SI will be able to get things done as planned and intended (at least according to the SIâs understandingânot addressing outer alignment here). If it canât stop its own agents from doing things it agrees are not the mission, itâs not an SI.
Does that make sense? Agree? Disagree?