Thanks! A lot of my thinking here is that I just really believe that, once people find the right neural architecture, self-supervised learning on the internet is going rocket-launch all the way to AGI and beyond, leaving little narrow AI services in the dust.
The way I read it, Gwern’s tool-AI article is mostly about self-improvement. I’m proposing that the system will be able to guide human-in-the-loop “self”-improvement. That’s kinda slower, but probably good enough, especially since eventually we can (hopefully) ask the oracle how to build a safe agent.
The way I read it, Gwern’s tool-AI article is mostly about self-improvement.
I’m not sure I understand what you mean here. I linked Gwern’s post because your proposal sounded very similar to me to Holden’s Tool AI concept, and Gwern’s post is one of the more comprehensive responses I can remember coming across.
Is it your impression that what you’re proposing is substantially different from Holden’s Tool AI?
When I say that your idea sounded similar, I’m thinking of passages like this (from Holden):
Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.” An “agent,” by contrast, has an underlying instruction set that conceptually looks like: “(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A.” In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the “tool” version rather than the “agent” version, and this separability is in fact present with most/all modern software. Note that in the “tool” version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter—to describe a program of this kind as “wanting” something is a category error, and there is no reason to expect its step (2) to be deceptive….This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on “Friendliness theory” moot.
Compared to this (from you):
Finally, we query the system in a way that is compatible with its self-unawareness. For example, if we want to cure cancer, one nice approach would be to program it to search through its generative model and output the least improbable scenario wherein a cure for cancer is discovered somewhere in the world in the next 10 years. Maybe it would output: “A scientist at a university will be testing immune therapy X, and they will combine it with blood therapy Y, and they’ll find that the two together cure all cancers”. Then, we go combine therapies X and Y ourselves.
Your, “Then, we go combine therapies X and Y ourselves.” to me sounds a lot like Holden’s separation of (1) Calculating the best action vs (2) Either explaining (in the case of Tool AI) or executing (in the case of Agent AI) the action. In both cases you seem to be suggesting that we can reap the rewards of superintelligence but retain control by treating the AI as an advisor rather than as an agent who acts on our behalf.
Am I right that what you’re proposing is pretty much along the same lines as Holden’s Tool AI—or is there some key difference that I’m missing?
Thanks for that, this is helpful. Yes, same genre for sure. According to Eliezer’s response to Holden, tool AI is a synonym of “non-self-improving oracle”. Anyway, whatever we call it, my understanding of the case against tool AI is that (1) we don’t know how to make a safe tool AI (part of Eliezer’s response), and (2) even if we could, it wouldn’t be competitive (Gwern’s response).
I’m trying to contribute to this conversation by giving an intuitive argument for how I’m thinking that both these objections can be overcome, and I’m also trying to be more specific about how the tool AI might be built and how it might work.
More specifically, most (though not 100%) of the reasons that Gwern said tool AI would be uncompetitive are in the category of “self-improving systems are more powerful”. So that’s why I specifically mentioned that a tool AI can be self-improving … albeit indirectly and with a human in the loop.
Thanks! A lot of my thinking here is that I just really believe that, once people find the right neural architecture, self-supervised learning on the internet is going rocket-launch all the way to AGI and beyond, leaving little narrow AI services in the dust.
The way I read it, Gwern’s tool-AI article is mostly about self-improvement. I’m proposing that the system will be able to guide human-in-the-loop “self”-improvement. That’s kinda slower, but probably good enough, especially since eventually we can (hopefully) ask the oracle how to build a safe agent.
I’m not sure I understand what you mean here. I linked Gwern’s post because your proposal sounded very similar to me to Holden’s Tool AI concept, and Gwern’s post is one of the more comprehensive responses I can remember coming across.
Is it your impression that what you’re proposing is substantially different from Holden’s Tool AI?
When I say that your idea sounded similar, I’m thinking of passages like this (from Holden):
Compared to this (from you):
Your, “Then, we go combine therapies X and Y ourselves.” to me sounds a lot like Holden’s separation of (1) Calculating the best action vs (2) Either explaining (in the case of Tool AI) or executing (in the case of Agent AI) the action. In both cases you seem to be suggesting that we can reap the rewards of superintelligence but retain control by treating the AI as an advisor rather than as an agent who acts on our behalf.
Am I right that what you’re proposing is pretty much along the same lines as Holden’s Tool AI—or is there some key difference that I’m missing?
Thanks for that, this is helpful. Yes, same genre for sure. According to Eliezer’s response to Holden, tool AI is a synonym of “non-self-improving oracle”. Anyway, whatever we call it, my understanding of the case against tool AI is that (1) we don’t know how to make a safe tool AI (part of Eliezer’s response), and (2) even if we could, it wouldn’t be competitive (Gwern’s response).
I’m trying to contribute to this conversation by giving an intuitive argument for how I’m thinking that both these objections can be overcome, and I’m also trying to be more specific about how the tool AI might be built and how it might work.
More specifically, most (though not 100%) of the reasons that Gwern said tool AI would be uncompetitive are in the category of “self-improving systems are more powerful”. So that’s why I specifically mentioned that a tool AI can be self-improving … albeit indirectly and with a human in the loop.