For my part, I kinda updated towards “Well, actually data efficiency isn’t quite exactly what I care about, and EfficientZero is gaming / Goodharting that metric in a way that dissociates it from the thing that I care about”. See here.
Yeah, I know it totally sounds like special pleading / moving the goalposts. Oh well.
For example, I consider “running through plans one-timestep-at-a-time” to be a kind of brute force way to make plans and understand consequences, and I’m skeptical of that kind of thing scaling to “real world intelligence and common sense”. By contrast, the brain can do flexible planning at multiple levels of an abstraction hierarchy, that it can build and change in real time, like how “I’m gonna go to the store and buy cucumbers” is actually millions of motor actions. EfficientZero still retains that brute-force aspect, seems to me. It just rejiggers things so that the brute-force aspect doesn’t count as “data inefficiency”.
Thanks this is helpful! I think for timelines though… EfficientZero can play an Atari game for 2 subjective hours and get human-level ability at it. That’s, like, 1000 little 7-second clips of gameplay—maybe 1000 ‘lives,’ or 1000 data points. Make a list of all the “transformative tasks” and “dangerous tasks” etc. and then go down the list and ask: Can we collect 1000 data points for this task? How many subjective hours is each data point? Remember, humans have at most about 10,000 hours on any particular task. So even if it takes 20,000 hours for each data point, that’s only 10M subjective hours total… which is only 7 OOMs more than EfficientZero had in training.
EfficientZero costs 1 day on $10K worth of hardware. Imagine it is 2030, hardware is 2 OOMs cheaper, and people are spending $10B on hardware and running it for 100 days. That’s 10 OOMs more compute to work with. So, we could run EfficientZero for 7 OOMs longer, and thereby get our 1000 data points of experience, each 20,000 hours long. And if EfficientZero could beat humans in data-efficiency for Atari, why wouldn’t it also beat humans for data-efficiency at this transformative task / dangerous task? Especially because we only used 7 of our 10 available OOMs, so we can also make it 1000x larger if we want to.
And this argument only has to work for at least one transformative / dangerous task, not all of them.
This is a crude sketchy argument of course, but you see what I’m getting at? ETA: I’m attacking the view that by 2030ish we’ll have AIs that can do all the short-horizon tasks, but long-horizon tasks will only come around 2040 or 2050 because it takes a lot more compute to train on them because each data point requires a lot more subjective time.
Let’s say we want our EfficientZero-7 to output good alignmentforum blog posts. We have plenty of training data, in terms of the finished product, but we don’t have training data in terms of the “figuring out what to write” part. That part happens in the person’s head.
(Suppose the test data is a post containing Insight X. If we’re training a network to output that post, the network updates can lead to the ability to figure out Insight X, or can lead the network to already know Insight X. Evidence from GPT-3 suggests that the latter is what would actually happen, IMO.)
So then maybe you’ll say: Someone will get the AGI safety researcher to write an alignmentforum blog post while wearing a Kernel Flux brain-scanner helmet, and make EfficientZero-7 build a model from that. But I’m skeptical that the brain-scan data would sufficiently constrain the model so that it would learn how to “figure things out”. Brain scans are too low-resolution, too noisy, and/or too incomplete. I think they would miss pretty much all the important aspects of “figuring things out”.
I think if we had a sufficiently good operationalization of “figuring things out” to train EfficientZero-7, we could just use that to build a “figuring things out” AGI directly instead.
That’s my guess anyway.
Then maybe your response would be: Writing alignmentforum blog posts is a bad example. Instead let’s build silicon-eating nanobots. We can run a slow expensive molecular-dynamics simulation running on a supercomputer, and we can have EfficientZero-7 query it, watch it, build its own “mental model” of what happens in a molecular simulation, and recapitulate that model on cheaper faster GPUs. And we can put in some kind of score that’s maximized when you query the model with the precursors to a silicon-eating nanobot.
I can get behind that kind of story; indeed, I would not be surprised to see papers along those general lines popping up on arxiv tomorrow, or indeed years ago. But would describe that kind of thing as “pivotal acts that require only narrow AI”. I’m not an expert on pivotal acts, and I’m open-minded to the possibility that there are “pivotal acts that require only narrow AI”. And I’m also open-minded to the possibility that we can’t do those acts today, because they require too much querying of expensive-to-query stuff like molecular simulation code or humans or real-world actuation, and that future narrow-AI advances like EfficientZero-7 will solve that problem. I guess I’m modestly skeptical, but I suppose there are unknown unknowns (to me), and certainly I haven’t spent enough time thinking about pivotal acts to have any confidence.
I wasn’t imagining this being a good thing that helps save the world; I was imagining it being a world-ending thing that someone does anyway because they don’t realize how dangerous it is.
I totally agree that the two examples you gave probably wouldn’t work. How about this though:
--Our task will be: Be a chatbot. Talk to users over the course of several months to get them to give you high marks in a user satisfaction survey.
--Pre-train the model on logs of human-to-human chat conversations so you have a reasonable starting point for making predictions about how conversations go.
--Then run the efficientzero algorithm, but with a massively larger parameter count, and talking to hundreds of thousands (millions?) of humans for several years. It would be a very expensive, laggy chatbot (but the user wouldn’t care since they aren’t paying for it and even with lag the text comes in about as fast as a human would reply)
Seems to me this would “work” in the sense that we’d all die within a few years of this happening, on the default trajectory.
In a similar conversation about non-main-actor paths to dangerous AI I came up with this as an example of a path I can imagine being plausible and dangerous: A plausible-to-me worst case scenario would be something like: A phone-scam organization employs someone to build them a online-learning reinforcement learning agent (using an open-source language model as a language-understanding-component) that functions as a scam-helper. It takes in the live transcription of the ongoing conversation between a scammer and a victim, and gives the scammer suggestions for what to say next to persuade the victim to send money. So long as it was even a bit helpful sometimes according to the team of scammers using it, more resources would be given to it and it would continue to collect useful data.
I think this scenario contains a number of dangerous aspects: being illegal and secret, not subject to ethical or safety guidance or regulation deliberately being designed to open-endedly self-improve bringing in incremental resources as it trains to continue to prove its worth (thus not needing a huge initial investment of training cost)
being agentive and directed at the specific goal of manipulating and deceiving humans
I don’t think we need 10 more years of progress in algorithms and compute for this story to be technologically feasible. A crude version of this is possibly already in use, and we wouldn’t know.
For my part, I kinda updated towards “Well, actually data efficiency isn’t quite exactly what I care about, and EfficientZero is gaming / Goodharting that metric in a way that dissociates it from the thing that I care about”. See here.
Yeah, I know it totally sounds like special pleading / moving the goalposts. Oh well.
For example, I consider “running through plans one-timestep-at-a-time” to be a kind of brute force way to make plans and understand consequences, and I’m skeptical of that kind of thing scaling to “real world intelligence and common sense”. By contrast, the brain can do flexible planning at multiple levels of an abstraction hierarchy, that it can build and change in real time, like how “I’m gonna go to the store and buy cucumbers” is actually millions of motor actions. EfficientZero still retains that brute-force aspect, seems to me. It just rejiggers things so that the brute-force aspect doesn’t count as “data inefficiency”.
Thanks this is helpful! I think for timelines though… EfficientZero can play an Atari game for 2 subjective hours and get human-level ability at it. That’s, like, 1000 little 7-second clips of gameplay—maybe 1000 ‘lives,’ or 1000 data points. Make a list of all the “transformative tasks” and “dangerous tasks” etc. and then go down the list and ask: Can we collect 1000 data points for this task? How many subjective hours is each data point? Remember, humans have at most about 10,000 hours on any particular task. So even if it takes 20,000 hours for each data point, that’s only 10M subjective hours total… which is only 7 OOMs more than EfficientZero had in training.
EfficientZero costs 1 day on $10K worth of hardware. Imagine it is 2030, hardware is 2 OOMs cheaper, and people are spending $10B on hardware and running it for 100 days. That’s 10 OOMs more compute to work with. So, we could run EfficientZero for 7 OOMs longer, and thereby get our 1000 data points of experience, each 20,000 hours long. And if EfficientZero could beat humans in data-efficiency for Atari, why wouldn’t it also beat humans for data-efficiency at this transformative task / dangerous task? Especially because we only used 7 of our 10 available OOMs, so we can also make it 1000x larger if we want to.
And this argument only has to work for at least one transformative / dangerous task, not all of them.
This is a crude sketchy argument of course, but you see what I’m getting at? ETA: I’m attacking the view that by 2030ish we’ll have AIs that can do all the short-horizon tasks, but long-horizon tasks will only come around 2040 or 2050 because it takes a lot more compute to train on them because each data point requires a lot more subjective time.
Let’s say we want our EfficientZero-7 to output good alignmentforum blog posts. We have plenty of training data, in terms of the finished product, but we don’t have training data in terms of the “figuring out what to write” part. That part happens in the person’s head.
(Suppose the test data is a post containing Insight X. If we’re training a network to output that post, the network updates can lead to the ability to figure out Insight X, or can lead the network to already know Insight X. Evidence from GPT-3 suggests that the latter is what would actually happen, IMO.)
So then maybe you’ll say: Someone will get the AGI safety researcher to write an alignmentforum blog post while wearing a Kernel Flux brain-scanner helmet, and make EfficientZero-7 build a model from that. But I’m skeptical that the brain-scan data would sufficiently constrain the model so that it would learn how to “figure things out”. Brain scans are too low-resolution, too noisy, and/or too incomplete. I think they would miss pretty much all the important aspects of “figuring things out”.
I think if we had a sufficiently good operationalization of “figuring things out” to train EfficientZero-7, we could just use that to build a “figuring things out” AGI directly instead.
That’s my guess anyway.
Then maybe your response would be: Writing alignmentforum blog posts is a bad example. Instead let’s build silicon-eating nanobots. We can run a slow expensive molecular-dynamics simulation running on a supercomputer, and we can have EfficientZero-7 query it, watch it, build its own “mental model” of what happens in a molecular simulation, and recapitulate that model on cheaper faster GPUs. And we can put in some kind of score that’s maximized when you query the model with the precursors to a silicon-eating nanobot.
I can get behind that kind of story; indeed, I would not be surprised to see papers along those general lines popping up on arxiv tomorrow, or indeed years ago. But would describe that kind of thing as “pivotal acts that require only narrow AI”. I’m not an expert on pivotal acts, and I’m open-minded to the possibility that there are “pivotal acts that require only narrow AI”. And I’m also open-minded to the possibility that we can’t do those acts today, because they require too much querying of expensive-to-query stuff like molecular simulation code or humans or real-world actuation, and that future narrow-AI advances like EfficientZero-7 will solve that problem. I guess I’m modestly skeptical, but I suppose there are unknown unknowns (to me), and certainly I haven’t spent enough time thinking about pivotal acts to have any confidence.
I wasn’t imagining this being a good thing that helps save the world; I was imagining it being a world-ending thing that someone does anyway because they don’t realize how dangerous it is.
I totally agree that the two examples you gave probably wouldn’t work. How about this though:
--Our task will be: Be a chatbot. Talk to users over the course of several months to get them to give you high marks in a user satisfaction survey.
--Pre-train the model on logs of human-to-human chat conversations so you have a reasonable starting point for making predictions about how conversations go.
--Then run the efficientzero algorithm, but with a massively larger parameter count, and talking to hundreds of thousands (millions?) of humans for several years. It would be a very expensive, laggy chatbot (but the user wouldn’t care since they aren’t paying for it and even with lag the text comes in about as fast as a human would reply)
Seems to me this would “work” in the sense that we’d all die within a few years of this happening, on the default trajectory.
In a similar conversation about non-main-actor paths to dangerous AI I came up with this as an example of a path I can imagine being plausible and dangerous: A plausible-to-me worst case scenario would be something like:
A phone-scam organization employs someone to build them a online-learning reinforcement learning agent (using an open-source language model as a language-understanding-component) that functions as a scam-helper. It takes in the live transcription of the ongoing conversation between a scammer and a victim, and gives the scammer suggestions for what to say next to persuade the victim to send money. So long as it was even a bit helpful sometimes according to the team of scammers using it, more resources would be given to it and it would continue to collect useful data.
I think this scenario contains a number of dangerous aspects:
being illegal and secret, not subject to ethical or safety guidance or regulation
deliberately being designed to open-endedly self-improve
bringing in incremental resources as it trains to continue to prove its worth (thus not needing a huge initial investment of training cost)
being agentive and directed at the specific goal of manipulating and deceiving humans
I don’t think we need 10 more years of progress in algorithms and compute for this story to be technologically feasible. A crude version of this is possibly already in use, and we wouldn’t know.