I largely concur, but I think the argument is simpler and more intuitive. I want to boil this down a little and try to state it in plainer language:
Arguments for doom as a default apply to any AI that has unbounded goals and pursues those goals more competently than humans. Maximization, coherence, etc. are not central pieces.
Current AI doesn’t really have goals, so it’s not what we’re worried about. But we’ll give AI goals, because we want agents to get stuff done for us, and giving them goals seems necessary for that. All of the concerns for the doom argument will apply to real AI soon enough.
However, current AI systems may suggest a route to AGI that dodges soom of the more detailed doom arguments. Their relative lack of inherent goal-directedness and relative skill at following instructions true to their intent (and the human values behind them) may be cause for guarded optimism. One of my attempts to explain this is The (partial) fallacy of dumb superintelligence.
In a different form, the doom as default argument is:
IF an agent is smarter/more competent than you and
Has goals that conflict with yours
It will outsmart you somehow, eventually (probably soon)
It will achieve its goals and you will correspondingly not achieve yours
If its goals are unbounded and it “cares” about your goals near zero,
You will lose everything
Arguments that “we’re training it on human data so it will care about our values above zero” are extremely speculative. They could be true, but betting the future of humanity on it without thinking it through seems very, very foolish.
That’s my attempt at the simplest form of the doom by default argument
Just to point out the one distinction: I make no reference to game theoretic agents or coherence theorems. I think this are unnecessary distractions to the core argument. An agent that has weird and conflicting goals (and so isn’t coherent or a perfect game theoretic agent) will still take all of your stuff if its set of goals and values don’t weigh human property rights or human wellbeing very highly. That’s why we take the alignment problem to be the central problem in surviving AGI.
The other question implicit in this post was, why would we make AI less safe than current systems, which would remain pretty safe even if they were a lot smarter.
Asking why in the world humans would make AI with its own goals is like asking why in the world we’d create dynamite, much less nukes: because it will help humans accomplish their goals, until it doesn’t; and it’s as easy as calling your safe oracle AI (e.g., really good LLM) repeatedly with “what would an agent trying to accomplish X do with access to tools Y?” and passing the output to those tools. Agency is a one-line extension, and we’re not going to just not bother.
I like your comment, but I do want to comment this:
Arguments that “we’re training it on human data so it will care about our values above zero” are extremely speculative. They could be true, but betting the future of humanity on it without thinking it through seems very, very foolish.
has evidence against it, fortunately for us.
I summarize the evidence for the pretty large similarities between the human brain and current DL systems, and this allows us to transport insights from AI into neuroscience, and vice versa here:
But the point here is that one of the lessons from AI that is likely to transfer over to human values is that the data matters way more than the algorithm, optimizer, architecture, or hyperparameter choices.
I don’t go as far as this link does in claiming that the it in AI models is the data set, but I think a weaker version of this is basically right, and thus the bitter lesson holds for human values too:
I largely concur, but I think the argument is simpler and more intuitive. I want to boil this down a little and try to state it in plainer language:
Arguments for doom as a default apply to any AI that has unbounded goals and pursues those goals more competently than humans. Maximization, coherence, etc. are not central pieces.
Current AI doesn’t really have goals, so it’s not what we’re worried about. But we’ll give AI goals, because we want agents to get stuff done for us, and giving them goals seems necessary for that. All of the concerns for the doom argument will apply to real AI soon enough.
However, current AI systems may suggest a route to AGI that dodges soom of the more detailed doom arguments. Their relative lack of inherent goal-directedness and relative skill at following instructions true to their intent (and the human values behind them) may be cause for guarded optimism. One of my attempts to explain this is The (partial) fallacy of dumb superintelligence.
In a different form, the doom as default argument is:
IF an agent is smarter/more competent than you and
Has goals that conflict with yours
It will outsmart you somehow, eventually (probably soon)
It will achieve its goals and you will correspondingly not achieve yours
If its goals are unbounded and it “cares” about your goals near zero,
You will lose everything
Arguments that “we’re training it on human data so it will care about our values above zero” are extremely speculative. They could be true, but betting the future of humanity on it without thinking it through seems very, very foolish.
That’s my attempt at the simplest form of the doom by default argument
Just to point out the one distinction: I make no reference to game theoretic agents or coherence theorems. I think this are unnecessary distractions to the core argument. An agent that has weird and conflicting goals (and so isn’t coherent or a perfect game theoretic agent) will still take all of your stuff if its set of goals and values don’t weigh human property rights or human wellbeing very highly. That’s why we take the alignment problem to be the central problem in surviving AGI.
The other question implicit in this post was, why would we make AI less safe than current systems, which would remain pretty safe even if they were a lot smarter.
Asking why in the world humans would make AI with its own goals is like asking why in the world we’d create dynamite, much less nukes: because it will help humans accomplish their goals, until it doesn’t; and it’s as easy as calling your safe oracle AI (e.g., really good LLM) repeatedly with “what would an agent trying to accomplish X do with access to tools Y?” and passing the output to those tools. Agency is a one-line extension, and we’re not going to just not bother.
I like your comment, but I do want to comment this:
has evidence against it, fortunately for us.
I summarize the evidence for the pretty large similarities between the human brain and current DL systems, and this allows us to transport insights from AI into neuroscience, and vice versa here:
https://x.com/SharmakeFarah14/status/1837528997556568523
But the point here is that one of the lessons from AI that is likely to transfer over to human values is that the data matters way more than the algorithm, optimizer, architecture, or hyperparameter choices.
I don’t go as far as this link does in claiming that the it in AI models is the data set, but I think a weaker version of this is basically right, and thus the bitter lesson holds for human values too:
https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/
Other than that quote, I basically agree with the rest of your helpful comment here.