First off, if we have specific evidence (an answer to Objection 2) then the historical analogy in Objection 1 looks a lot weaker, as any real evidence of WFLL1 arising now would suggest that the historical cases of other algorithms that gave pathological results just aren’t representative. I think they aren’t representative.
(I think the discontinuity-based arguments largely do make the “this time is different” case, roughly because general intelligence seems clearly game-changing. WFLL2 seems somewhere in between these, and I’m unsure where my beliefs fall on that.)
The key difference ‘this time’ (before we get anywhere near WFLL2 or AGI), as I see it, is that those early algorithms give recommendations to people that they could implement or avoid, so the ‘exploratory phase’ where we poked around to find out what they were capable of was pretty much risk-free, while WFLL1 implies that the systems have some degree of autonomy and actually have a chance to do unexpected things without humans realizing straight away. Danzig’s linear optimization leading to a catastrophe would have required more carelessness and stupidity than (current or very near-future) deep RL, because deep RL’s mistakes are subtler and because it has to be loosed on some environment to achieve results and give us useful information on its behaviour. As for evidence, Stuart Russell thinks that we are already seeing WFLL1 in social media ad algorithms:
“Consider the so-called ‘filter bubble’ of social media. The reinforcement learning algorithm is trying to maximize click throughs. From the view of the human, the purpose of the machine is to maximize clickthroughs. But from the view of the machine, it is changing the state of the world to maximize clicks. It is changing you to make you more predictable. A raving fascist or communist is more predictable and will lap up raving content. The machines can change our mind about our objective function so we are easier to satisfy. Advertisers have done this for decades.” [I argued with him about this feedback loop, and Yann Le Cun says this changed at Facebook a while ago]
“The reinforcement learning algorithm in social media has destroyed the EU, NATO and democracy. And that’s just 50 lines of code.”
I wonder if this hypothesis was in Paul’s mind when he wrote the essay. If Russell is right about any of this that suggests that one of the first times we gave deep RL any ability to influence the world it succumbed to a failure scenario almost immediately. That’s not a good track record.
A raving fascist or communist is more predictable and will lap up raving content. The machines can change our mind about our objective function so we are easier to satisfy.
That’s a good way to put it!
This might be stretching the analogy, but I feel like there’s a similar thing going on with technological evolution of “gadgets” (digital watch, iPod, cell phone). It feels like people’s expectations of what a gadget should be able to do for them to make them content continue to grow at a rate so fast that something as simple and obviously beneficial as “battery life” never really receives an improvement. I get that not everyone is bothered by having to charge things all the time (and losing the charger all the time), but how come it’s borderline impossible to buy things that don’t need to be charged so often? It feels like there’s some optimization pressure at work here, and it’s not making life more convenient. :)
First off, if we have specific evidence (an answer to Objection 2) then the historical analogy in Objection 1 looks a lot weaker, as any real evidence of WFLL1 arising now would suggest that the historical cases of other algorithms that gave pathological results just aren’t representative. I think they aren’t representative.
The key difference ‘this time’ (before we get anywhere near WFLL2 or AGI), as I see it, is that those early algorithms give recommendations to people that they could implement or avoid, so the ‘exploratory phase’ where we poked around to find out what they were capable of was pretty much risk-free, while WFLL1 implies that the systems have some degree of autonomy and actually have a chance to do unexpected things without humans realizing straight away. Danzig’s linear optimization leading to a catastrophe would have required more carelessness and stupidity than (current or very near-future) deep RL, because deep RL’s mistakes are subtler and because it has to be loosed on some environment to achieve results and give us useful information on its behaviour. As for evidence, Stuart Russell thinks that we are already seeing WFLL1 in social media ad algorithms:
I wonder if this hypothesis was in Paul’s mind when he wrote the essay. If Russell is right about any of this that suggests that one of the first times we gave deep RL any ability to influence the world it succumbed to a failure scenario almost immediately. That’s not a good track record.
That’s a good way to put it!
This might be stretching the analogy, but I feel like there’s a similar thing going on with technological evolution of “gadgets” (digital watch, iPod, cell phone). It feels like people’s expectations of what a gadget should be able to do for them to make them content continue to grow at a rate so fast that something as simple and obviously beneficial as “battery life” never really receives an improvement. I get that not everyone is bothered by having to charge things all the time (and losing the charger all the time), but how come it’s borderline impossible to buy things that don’t need to be charged so often? It feels like there’s some optimization pressure at work here, and it’s not making life more convenient. :)