Evolution: taste buds and ice cream, sex and condoms… This analogy always was difficult to use in my experience. A year ago i came up with less technical. KPIs (key performance indicators) as inevitable way to communicate goals (to AI) to ultra-high-IQ psycopath-genius who’s into malicious compliance (kinda cant help himself being clone of Nicola Tesla, Einstain and bunch of different people, some of them probably CEO, becouse she can).
I have used it only 2 times and it was way easier than talks about different optimisation processes. And it took me only something like 8 years to come up with!
This analogy will be better for communicating with some people, but I feel like it was the goto at some earlier point, and the evolution analogy was invented to fix some problems with this one.
IE, before “inner alignment” became a big part of the discussion, a common explanation of the alignment problem was essentially what would now be called the outer alignment problem, which is precisely that (seemingly) any goal you write down has smart-alecky misinterpretations which technically do better than the intended interpretation. This is sometimes called nearest unblocked strategy or unforseen maximum or probably other jargon I’m forgetting.
The evolution analogy improves on this in some ways. I think one of the most common objections to the KPI analogy is something along the lines of “why is the AI so devoted to malicious compliance” or “why is the AI so dumb about interpreting what we ask it for”. Some OK answers to this are...
Gradient descent only optimizes the loss function you give it.
The AI only knows what you tell it.
The current dominant ML paradigm is all about minimizing some formally specified loss. That’s all we know how to do.
… But responses like this are ultimately a bit misleading, since (as the Shard-theory people emphasize, and as the evolution analogy attempts to explain) what you get out of gradient descent doesn’t treat loss-minimization as its utility function, and we don’t know how to make AIs which just intelligently optimize some given utility (except in very well-specified problems where learning isn’t needed), and the AI doesn’t only know what you tell it.
So for some purposes, the evolution analogy is superior.
I dislike both of those analogies, since the process of training an AI has little relation with evolution, and because the psychopath one presupposes an evil disposition on the part of the AI without providing any particular reason to think AI training will result in such an outcome.
In that scenario, what you are saying in more broad terms is:
“an AGI is a machine that scores really well on simulated tasks and tests”
“I don’t care how it does it, I just want max score on my heuristic (which includes terms for generality, size, breadth, and score)”
So there is no evolutionary pressure for a machine that will be lethally against us. Not directly. EY seems to believe that if we build an AGI, it will immediately be
(1) agentically pro “computer” faction
(2) coordinate with other instances that are of it’s faction
(3) super-intelligently good even at skills we can’t really teach in a benchmark
This is not necessarily what will happen. There is no signal from the above mechanism to create that. The reward gradients don’t point in that direction, they point towards allocating all neural weights to things that do better on the benchmarks. #1-3 are a complex mechanism that won’t start existing for no reason.
EY is saying “assume they are maximally hostile” and then pointing out all the ways we as humans would be screwed if so. (which is true)
What does bother me is that the “I don’t care how it does it” may in fact mean that the solutions that actually start to “win” AGI gym are in fact biased towards hostility or agentic behavior because that ends up being the cognitive structure required to win at higher levels of play.
Both times my talks went that way (why they did not raise him good—why we could not program AI to be good; cant we keep on eye on them, and so on), but it would take to long to summarise something like 10 minutes dialog, so i am not going to do this. Sorry.
Evolution: taste buds and ice cream, sex and condoms… This analogy always was difficult to use in my experience. A year ago i came up with less technical. KPIs (key performance indicators) as inevitable way to communicate goals (to AI) to ultra-high-IQ psycopath-genius who’s into malicious compliance (kinda cant help himself being clone of Nicola Tesla, Einstain and bunch of different people, some of them probably CEO, becouse she can).
I have used it only 2 times and it was way easier than talks about different optimisation processes. And it took me only something like 8 years to come up with!
This analogy will be better for communicating with some people, but I feel like it was the goto at some earlier point, and the evolution analogy was invented to fix some problems with this one.
IE, before “inner alignment” became a big part of the discussion, a common explanation of the alignment problem was essentially what would now be called the outer alignment problem, which is precisely that (seemingly) any goal you write down has smart-alecky misinterpretations which technically do better than the intended interpretation. This is sometimes called nearest unblocked strategy or unforseen maximum or probably other jargon I’m forgetting.
The evolution analogy improves on this in some ways. I think one of the most common objections to the KPI analogy is something along the lines of “why is the AI so devoted to malicious compliance” or “why is the AI so dumb about interpreting what we ask it for”. Some OK answers to this are...
Gradient descent only optimizes the loss function you give it.
The AI only knows what you tell it.
The current dominant ML paradigm is all about minimizing some formally specified loss. That’s all we know how to do.
… But responses like this are ultimately a bit misleading, since (as the Shard-theory people emphasize, and as the evolution analogy attempts to explain) what you get out of gradient descent doesn’t treat loss-minimization as its utility function, and we don’t know how to make AIs which just intelligently optimize some given utility (except in very well-specified problems where learning isn’t needed), and the AI doesn’t only know what you tell it.
So for some purposes, the evolution analogy is superior.
And yeah, probably neither analogy is great.
I dislike both of those analogies, since the process of training an AI has little relation with evolution, and because the psychopath one presupposes an evil disposition on the part of the AI without providing any particular reason to think AI training will result in such an outcome.
Here’s I think a grounded description of the process of creating an AGI: https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/?commentId=Mvyq996KxiE4LR6ii
In that scenario, what you are saying in more broad terms is:
“an AGI is a machine that scores really well on simulated tasks and tests”
“I don’t care how it does it, I just want max score on my heuristic (which includes terms for generality, size, breadth, and score)”
So there is no evolutionary pressure for a machine that will be lethally against us. Not directly. EY seems to believe that if we build an AGI, it will immediately be
(1) agentically pro “computer” faction
(2) coordinate with other instances that are of it’s faction
(3) super-intelligently good even at skills we can’t really teach in a benchmark
This is not necessarily what will happen. There is no signal from the above mechanism to create that. The reward gradients don’t point in that direction, they point towards allocating all neural weights to things that do better on the benchmarks. #1-3 are a complex mechanism that won’t start existing for no reason.
EY is saying “assume they are maximally hostile” and then pointing out all the ways we as humans would be screwed if so. (which is true)
What does bother me is that the “I don’t care how it does it” may in fact mean that the solutions that actually start to “win” AGI gym are in fact biased towards hostility or agentic behavior because that ends up being the cognitive structure required to win at higher levels of play.
Both times my talks went that way (why they did not raise him good—why we could not program AI to be good; cant we keep on eye on them, and so on), but it would take to long to summarise something like 10 minutes dialog, so i am not going to do this. Sorry.