I am no AI expert. Still, I have some views about AI alignment, and this is an excellent place to share them.
[I’m stating the following as background for the rest of my comment.] AI alignment splits nicely into:
Inner alignment: aligning an agent with a goal.
Outer alignment: aligning a goal with a value.
The terms agent and value are exceptionally poorly defined. What even is an agent? Can we point to some physical system and call it an agent? What even are values?
Our understanding of agents is limited, and it is an “I know it when I see it” sort of understanding. We know that humans are agents. Humans are agents we have right before our eyes today, unlike the theoretical agents with which AI alignment is concerned. Are groups of agents also agents? E.g. is a market, a nation or a government an agent made up of subagents?
If we agree that humans are agents, then do we understand how to align human beings towards desirable goals that align with some values? If we don’t know how to align human beings effectively, what chances do we have of aligning theoretical agents that don’t yet exist?
Suppose that your goal is to develop vaccines for viral pandemics. You have no idea how to make vaccines for existing viruses. Instead of focusing on learning the knowledge needed to create vaccines for existing viruses, you create models of what viruses might theoretically look like 100 years from now based on axioms and deduced theorems. Once you have these theoretical models, you simulate theoretical agents, viruses and vaccines and observe how they perform in simulated environments. This is useful indeed and could lead to significant breakthroughs, but we have a tighter learning loop by working with real viruses and real agents interacting in real environments.
In my eyes, the problem of AI alignment is more broadly a problem of aligning the technology we humans create towards fulfilling human values (whatever human values are). The problem of aligning the technology we make towards human values is a problem of figuring out what those values are and then figuring out incentive schemes to get humanity to cooperate towards achieving those values. Given the abysmal state of international cooperation, we are doing very badly at this.
Once I finished writing the above, I had some second thoughts. I was reminded of this essay written by Max Tegmark:
It was OK for wisdom to sometimes lag in the race because it would catch up when needed. With more powerful technologies such as nuclear weapons, synthetic biology, and future strong artificial intelligence, however, learning from mistakes is not a desirable strategy: we want to develop our wisdom in advance so that we can get things right the first time because that might be the only time we’ll have.
The above quote highlights what is so unique about AI safety. The best course of action might be to work with theoretical agents because we may have no time to solve the problem when the superintelligent agents arrive. The probability of my house being destroyed is small, but I still pay for insurance because that’s the rational thing to do. Similarly, even if the probability of catastrophic risk resulting from superintelligence is small, it’s still prudent to invest in safety research.
That said, I still stand by my earlier stances. Working towards aligning existing agents and working towards aligning theoretical agents are both crucial pursuits.
I am no AI expert. Still, I have some views about AI alignment, and this is an excellent place to share them.
[I’m stating the following as background for the rest of my comment.] AI alignment splits nicely into:
Inner alignment: aligning an agent with a goal.
Outer alignment: aligning a goal with a value.
The terms agent and value are exceptionally poorly defined. What even is an agent? Can we point to some physical system and call it an agent? What even are values?
Our understanding of agents is limited, and it is an “I know it when I see it” sort of understanding. We know that humans are agents. Humans are agents we have right before our eyes today, unlike the theoretical agents with which AI alignment is concerned. Are groups of agents also agents? E.g. is a market, a nation or a government an agent made up of subagents?
If we agree that humans are agents, then do we understand how to align human beings towards desirable goals that align with some values? If we don’t know how to align human beings effectively, what chances do we have of aligning theoretical agents that don’t yet exist?
Suppose that your goal is to develop vaccines for viral pandemics. You have no idea how to make vaccines for existing viruses. Instead of focusing on learning the knowledge needed to create vaccines for existing viruses, you create models of what viruses might theoretically look like 100 years from now based on axioms and deduced theorems. Once you have these theoretical models, you simulate theoretical agents, viruses and vaccines and observe how they perform in simulated environments. This is useful indeed and could lead to significant breakthroughs, but we have a tighter learning loop by working with real viruses and real agents interacting in real environments.
In my eyes, the problem of AI alignment is more broadly a problem of aligning the technology we humans create towards fulfilling human values (whatever human values are). The problem of aligning the technology we make towards human values is a problem of figuring out what those values are and then figuring out incentive schemes to get humanity to cooperate towards achieving those values. Given the abysmal state of international cooperation, we are doing very badly at this.
Once I finished writing the above, I had some second thoughts. I was reminded of this essay written by Max Tegmark:
The above quote highlights what is so unique about AI safety. The best course of action might be to work with theoretical agents because we may have no time to solve the problem when the superintelligent agents arrive. The probability of my house being destroyed is small, but I still pay for insurance because that’s the rational thing to do. Similarly, even if the probability of catastrophic risk resulting from superintelligence is small, it’s still prudent to invest in safety research.
That said, I still stand by my earlier stances. Working towards aligning existing agents and working towards aligning theoretical agents are both crucial pursuits.