A very short version of this post, which seemed worth rattling of quickly for now.
A few months ago, I was talking to John about paradimicity in AI alignment. John says “we don’t currently have a good paradigm.” I asked “Is ‘Natural Abstraction’ a good paradigm?”. He said “No, but I think it’s something that’s likely to output a paradigm that’s closer to the right paradigm for AI Alignment.”
“How many paradigms are we away from the right paradigm?”
“Like, I dunno, maybe 3?” said he.
Awhile later I saw John arguing on LessWrong with (I think?) Ryan Greenblatt about whether Ryan’s current pseudo-paradigm was good. (Sorry if I got the names here or substance here wrong, I couldn’t find the original thread, and it seemed slightly better to be specific so we could dig into a concrete example).
One distinction in the discussion seemed to be something like:
On one hand, Ryan thought his current paradigm (this might have been “AI Control”, as contrasted with “AI Alignment”) had a bunch of traction on producing a plan that would at least reasonably help if we had to align superintelligent AIs in the near future.
On the other hand, John argued that the paradigm didn’t feel like the sort of thing that was likely to bear the fruit of new, better paradigms. It focused on an area of the superintelligence problem that, while locally tractable, John thought was insufficient to actually solve the problem, and also wasn’t the sort of thing likely to pave the way to new paradigms.
Now a) again I’m not sure I’m remembering this conversation right, b) whether either of those points are true in this particular case would be up for debate and I’m not arguing they’re true. (also, regardless, I am interested in the idea of AI Control and think that getting AI companies to actually do the steps necessary to control at least nearterm AIs is something worth putting effort into)
But it seemed good to promote to attention the idea that: when you’re looking at clusters of AI Safety research and thinking about whether it is congealing into a useful, promising paradigm, one of the questions to ask is not just “does this paradigm seem locally tractable” but “do I have a sense that this paradigm will open up new lines of research that can lead to be better paradigms?”.
(Whether one can be accurate in answering that question is yet another uncertainty. But, I think if you ask yourself “is this approach/paradigm useful”, your brain will respond with different intuitions than “does this approach/paradigm seem likely to result in new/better paradigms?”)
“Does your paradigm beget new, good, paradigms?”
A very short version of this post, which seemed worth rattling of quickly for now.
A few months ago, I was talking to John about paradimicity in AI alignment. John says “we don’t currently have a good paradigm.” I asked “Is ‘Natural Abstraction’ a good paradigm?”. He said “No, but I think it’s something that’s likely to output a paradigm that’s closer to the right paradigm for AI Alignment.”
“How many paradigms are we away from the right paradigm?”
“Like, I dunno, maybe 3?” said he.
Awhile later I saw John arguing on LessWrong with (I think?) Ryan Greenblatt about whether Ryan’s current pseudo-paradigm was good. (Sorry if I got the names here or substance here wrong, I couldn’t find the original thread, and it seemed slightly better to be specific so we could dig into a concrete example).
One distinction in the discussion seemed to be something like:
On one hand, Ryan thought his current paradigm (this might have been “AI Control”, as contrasted with “AI Alignment”) had a bunch of traction on producing a plan that would at least reasonably help if we had to align superintelligent AIs in the near future.
On the other hand, John argued that the paradigm didn’t feel like the sort of thing that was likely to bear the fruit of new, better paradigms. It focused on an area of the superintelligence problem that, while locally tractable, John thought was insufficient to actually solve the problem, and also wasn’t the sort of thing likely to pave the way to new paradigms.
Now a) again I’m not sure I’m remembering this conversation right, b) whether either of those points are true in this particular case would be up for debate and I’m not arguing they’re true. (also, regardless, I am interested in the idea of AI Control and think that getting AI companies to actually do the steps necessary to control at least nearterm AIs is something worth putting effort into)
But it seemed good to promote to attention the idea that: when you’re looking at clusters of AI Safety research and thinking about whether it is congealing into a useful, promising paradigm, one of the questions to ask is not just “does this paradigm seem locally tractable” but “do I have a sense that this paradigm will open up new lines of research that can lead to be better paradigms?”.
(Whether one can be accurate in answering that question is yet another uncertainty. But, I think if you ask yourself “is this approach/paradigm useful”, your brain will respond with different intuitions than “does this approach/paradigm seem likely to result in new/better paradigms?”)
Some prior reading:
Look For Principles Which Will Carry Over To The Next Paradigm
Open Problems Create Paradigms