The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.
Ooops.
The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn. Yudkowsky’s dream of building the singleton Sysop is gone and was probably never achievable in the first place.
People double down with the “mesaoptimizer” frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there’s a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it’s easy to score Ws; it was mildly surprising when he even responded with “This is kinda long.” to Quinton Pope’s objection thread.
What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
So what am I supposed to extract from this pattern of behaviour?
Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn.
I think you’ve updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.
If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they’re idiot-savants with skill gaps that prevent them from working independently, and no AI system has passed the litmus test I use for identifying good (human) programmers. They’re advancing in that direction pretty rapidly, but they’re unambiguously not there yet.
Similarly, if a treacherous turn happens, it happens no earlier than the point where AI systems can do strategic reasoning with long chains of inference; this again has an idiot-savant dynamic going on, which can create the false impression that this landmark has been reached, when in fact it hasn’t.
They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively.
Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)
They predict fast takeoff and FOOM. … Deep Learning systems don’t look like they FOOM.
It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it’s my understanding that Eliezer still predicts this. Or is that false?)
Stochastic Gradient Descent doesn’t look like it will treacherous turn.
It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
So what am I supposed to extract from this pattern of behaviour?
I think that at least some of the things you take to be obvious conclusions that Eliezer/MIRI should’ve drawn, are in fact not obvious, and some are even plausibly false.
You also make some good points. But there isn’t nearly so clear a pattern as you suggest.
It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
As I understand the argument, it goes like the following:
For evolutionary methods, you can’t predict the outcome of changes before they’re made, and so you end up with ‘throw the spaghetti at the wall and see what sticks’. At some point, those changes accumulate to a mind that’s capable of figuring out what environment it’s in and then performing well at that task, so you get what looks like an aligned agent while you haven’t actually exerted any influence on its internal goals (i.e. what it’ll do once it’s out in the world).
For gradient-descent based methods, you can predict the outcome of changes before they’re made; that’s the gradient part. It’s overall less plausible that the system you’re building figures out generic reasoning and then applies that generic reasoning to a specific task, compared to figuring out the specific reasoning for the task that you’d like solved. Jumps in the loss look more like “a new cognitive capacity has emerged in the network” and less like “the system is now reasoning about its training environment”.
Of course, that “overall less plausible” is making a handwavy argument about what simplicity metric we should be using and which design is simpler according to that metric. Related, earlier research: Are minimal circuits deceptive?
IMO this should be somewhat persuasive but not conclusive. I’m much happier with a transformer shaped by a giant English text corpus than I am with whatever is spit out by a neural-architecture-search program pointed at itself! But for cognitive megaprojects, I think you probably have to have something-like-a-mind in there, even if you got to it by SGD.
It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.
This doesn’t seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don’t have links handy but I’m pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.
Closest thing I’m aware of is that at the time of the AlphaGo matches he bet people at like 3:2 odds, favourable to him, that Lee Sedol would win. Link here
The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...
MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won’t be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.
Ooops.
The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don’t look like they FOOM. Stochastic Gradient Descent doesn’t look like it will treacherous turn. Yudkowsky’s dream of building the singleton Sysop is gone and was probably never achievable in the first place.
People double down with the “mesaoptimizer” frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there’s a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it’s easy to score Ws; it was mildly surprising when he even responded with “This is kinda long.” to Quinton Pope’s objection thread.
What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn’t going to end from a FOOMing AI, why should MIRI get paid?
So what am I supposed to extract from this pattern of behaviour?
I think you’ve updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.
If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they’re idiot-savants with skill gaps that prevent them from working independently, and no AI system has passed the litmus test I use for identifying good (human) programmers. They’re advancing in that direction pretty rapidly, but they’re unambiguously not there yet.
Similarly, if a treacherous turn happens, it happens no earlier than the point where AI systems can do strategic reasoning with long chains of inference; this again has an idiot-savant dynamic going on, which can create the false impression that this landmark has been reached, when in fact it hasn’t.
Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)
It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it’s my understanding that Eliezer still predicts this. Or is that false?)
It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!
I think that at least some of the things you take to be obvious conclusions that Eliezer/MIRI should’ve drawn, are in fact not obvious, and some are even plausibly false.
You also make some good points. But there isn’t nearly so clear a pattern as you suggest.
As I understand the argument, it goes like the following:
For evolutionary methods, you can’t predict the outcome of changes before they’re made, and so you end up with ‘throw the spaghetti at the wall and see what sticks’. At some point, those changes accumulate to a mind that’s capable of figuring out what environment it’s in and then performing well at that task, so you get what looks like an aligned agent while you haven’t actually exerted any influence on its internal goals (i.e. what it’ll do once it’s out in the world).
For gradient-descent based methods, you can predict the outcome of changes before they’re made; that’s the gradient part. It’s overall less plausible that the system you’re building figures out generic reasoning and then applies that generic reasoning to a specific task, compared to figuring out the specific reasoning for the task that you’d like solved. Jumps in the loss look more like “a new cognitive capacity has emerged in the network” and less like “the system is now reasoning about its training environment”.
Of course, that “overall less plausible” is making a handwavy argument about what simplicity metric we should be using and which design is simpler according to that metric. Related, earlier research: Are minimal circuits deceptive?
IMO this should be somewhat persuasive but not conclusive. I’m much happier with a transformer shaped by a giant English text corpus than I am with whatever is spit out by a neural-architecture-search program pointed at itself! But for cognitive megaprojects, I think you probably have to have something-like-a-mind in there, even if you got to it by SGD.
It’s pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won’t FOOM, or we otherwise needn’t do anything inconvenient to get good outcomes. It’s proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.
FWIW I’m considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected. There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible. That optimism just doesn’t take it under the stop worrying threshold.
This doesn’t seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don’t have links handy but I’m pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.
Incidentally, I don’t think I’m willing to trust a hearsay report on this without confirmation.
Do you happen to have any links to Eliezer making such a claim in public? Or, at least, any confirmation that the cited comment was made as described?
Closest thing I’m aware of is that at the time of the AlphaGo matches he bet people at like 3:2 odds, favourable to him, that Lee Sedol would win. Link here