Thanks for sharing this. I’ve been looking forward to your thoughts on OpenAI’s plan, and I think you presented them succinctly/clearly. I found the “evaluation vs generation” section particularly interesting/novel.
One thought: I’m currently not convinced that we would need general intelligence in order to generate new alignment ideas.
The alignment problem needs high general intelligence, because it needs new ideas for solving alignment. It won’t be enough to input all the math around the alignment problem and have the AI solve that. It’s a great improvement over what we have but it will only gain us speed, not insight.
It seems plausible to me that we could get original ideas out of systems that were subhuman in general intelligence but superhuman in particular domains.
Example 1: Superhuman in processing speed. Imagine an AI that never came up with any original thought on its own. But it has superhuman processing speed, so it comes up with ideas 10000X faster than us. It never comes up with an idea that humans wouldn’t ever have been able to come up with, but it certainly unlocks a bunch of ideas that we wouldn’t have discovered by 2030, or 2050, or whenever the point-of-no-return is.
Example 2: Superhuman in “creative idea generation”. Anecdotally, some people are really good at generating lots of possible ideas, but they’re not intelligent enough to be good at filtering them. I could imagine a safe AI system that is subhuman at idea filtering (or “pruning”) but superhuman at idea generation (or “babbling”).
Whether or not we will actually be able to produce such systems is a much harder question. I lack strong models of “how cognition works”, and I wouldn’t find it crazy if people with stronger models of cognition were like “wow, this is so unrealistic—it’s just not realistic for us to find an agent with superhuman creativity unless it’s also superhuman at a bunch of other things and then it cuts through you like butter.”
But at least conceptually, it seems plausible to me that we could get new ideas with systems that are narrowly intelligent in specific domains without requiring them to be generally intelligent. (And in fact, this is currently my greatest hope for AI-assisted alignment schemes).
Good question. I’m using the term “idea” pretty loosely and glossily.
Things that would meet this vague definition of “idea”:
The ELK problem (like going from nothing to “ah, we’ll need a way of eliciting latent knowledge from AIs”)
Identifying the ELK program as a priority/non-priority (generating the arguments/ideas that go from “this ELK thing exists” to “ah, I think ELK is one of the most important alignment directions” or “nope, this particular problem/approach doesn’t matter much”
An ELK proposal
A specific modification to an ELK proposal that makes it 5% better.
So new ideas could include new problems/subproblems we haven’t discovered, solutions/proposals, code to help us implement proposals, ideas that help us prioritize between approaches, etc.
How are you defining “idea” (or do you have a totally different way of looking at things)?
Thanks for sharing this. I’ve been looking forward to your thoughts on OpenAI’s plan, and I think you presented them succinctly/clearly. I found the “evaluation vs generation” section particularly interesting/novel.
One thought: I’m currently not convinced that we would need general intelligence in order to generate new alignment ideas.
It seems plausible to me that we could get original ideas out of systems that were subhuman in general intelligence but superhuman in particular domains.
Example 1: Superhuman in processing speed. Imagine an AI that never came up with any original thought on its own. But it has superhuman processing speed, so it comes up with ideas 10000X faster than us. It never comes up with an idea that humans wouldn’t ever have been able to come up with, but it certainly unlocks a bunch of ideas that we wouldn’t have discovered by 2030, or 2050, or whenever the point-of-no-return is.
Example 2: Superhuman in “creative idea generation”. Anecdotally, some people are really good at generating lots of possible ideas, but they’re not intelligent enough to be good at filtering them. I could imagine a safe AI system that is subhuman at idea filtering (or “pruning”) but superhuman at idea generation (or “babbling”).
Whether or not we will actually be able to produce such systems is a much harder question. I lack strong models of “how cognition works”, and I wouldn’t find it crazy if people with stronger models of cognition were like “wow, this is so unrealistic—it’s just not realistic for us to find an agent with superhuman creativity unless it’s also superhuman at a bunch of other things and then it cuts through you like butter.”
But at least conceptually, it seems plausible to me that we could get new ideas with systems that are narrowly intelligent in specific domains without requiring them to be generally intelligent. (And in fact, this is currently my greatest hope for AI-assisted alignment schemes).
Thanks!
And to both examples, how are you conceptualizing a “new idea”? Cause I suspect we don’t have the same model on what an idea is.
Good question. I’m using the term “idea” pretty loosely and glossily.
Things that would meet this vague definition of “idea”:
The ELK problem (like going from nothing to “ah, we’ll need a way of eliciting latent knowledge from AIs”)
Identifying the ELK program as a priority/non-priority (generating the arguments/ideas that go from “this ELK thing exists” to “ah, I think ELK is one of the most important alignment directions” or “nope, this particular problem/approach doesn’t matter much”
An ELK proposal
A specific modification to an ELK proposal that makes it 5% better.
So new ideas could include new problems/subproblems we haven’t discovered, solutions/proposals, code to help us implement proposals, ideas that help us prioritize between approaches, etc.
How are you defining “idea” (or do you have a totally different way of looking at things)?