I like the distinction between parallelizable and serial research time, and agree that there should be a very high bar for shortening AI timelines and eating up precious serial time.
One caveat to the claim that we should prioritize serial alignment work over parallelizable work, is that this assumes an omniscient and optimal allocator of researcher-hours to problems. Insofar as this assumption doesn’t hold (because our institutions fail, or because the knowledge about how to allocate researcher-hours itself depends on the outcomes of parallelizable research) the distinction between parallelizable and serial work breaks down and other considerations dominate.
One caveat to the claim that we should prioritize serial alignment work over parallelizable work, is that this assumes an omniscient and optimal allocator of researcher-hours to problems.
Sorry, I didn’t mean to imply that these are logical assumptions necessary for us to prioritize serial work; but rather insofar as these assumptions don’t hold, prioritizing work that looks serial to us is less important at the margin.
Spelling out the assumptions more:
Omniscient meaning “perfect advance knowledge of what work will turn out to be serial vs parallelizable.” In practice I think this is very hard to know beforehand—a lot of work that turned out to be part of the “serial bottleneck” looked parallelizable ex ante.
Optimal meaning “institutions will actually allocate enough researchers to the problem in time for the parallelizable work to get done”. Insofar as we don’t expect this to hold, we will lose even if all the serial work gets done in time.
Also, on a re-read I notice that all the examples given in the post relate to mathematics or theoretical work, which is almost uniquely serial among human activities. By contrast, engineering disciplines are typically much more parallelizable, as evidenced by the speedup in technological progress during war-time.
If you succeed at the framework-inventing “how does one even do this?” stage, then you can probably deploy an enormous amount of engineering talent in parallel to help with implementation, small iterative improvements, building-upon-foundations, targeting-established-metrics, etc. tasks.
I agree if the criterion is to get us to utopia, it’s a problem (maybe not even that, but whatever), but if we instead say that it has to avoid x-risk, then we do have some options. My favorite research directions are IDA and HCH, with ELK a second option for alignment. We aren’t fully finished on those ideas, but we do have at least some idea about what we can do.
Also, it’s very unlikely theoretical work or mathematical work like provable alignment will do much, beyond toy problems here.
I agree if the criterion is to get us to utopia, it’s a problem (maybe not even that, but whatever), but if we instead say that it has to avoid x-risk
This seems to misunderstand my view? My goal is to avoid x-risk, not to get us to utopia. (Or rather, my proximate goal is to end the acute risk period; ultimate goal utopia, but I want to pint nearly all of the utopia-work to the period after we’ve ended the acute risk period.
“When I say that alignment is lethally difficult, I am not talking about ideal or perfect goals of ‘provable’ alignment, nor total alignment of superintelligences on exact human values, nor getting AIs to produce satisfactory arguments about moral dilemmas which sorta-reasonable humans disagree about, nor attaining an absolute certainty of an AI not killing everyone. When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1” is an overly large ask that we are not on course to get. So far as I’m concerned, if you can get a powerful AGI that carries out some pivotal superhuman engineering task, with a less than fifty percent change of killing more than one billion people, I’ll take it. Even smaller chances of killing even fewer people would be a nice luxury, but if you can get as incredibly far as “less than roughly certain to kill everybody”, then you can probably get down to under a 5% chance with only slightly more effort. Practically all of the difficulty is in getting to “less than certainty of killing literally everyone”. Trolley problems are not an interesting subproblem in all of this; if there are any survivors, you solved alignment. At this point, I no longer care how it works, I don’t care how you got there, I am cause-agnostic about whatever methodology you used, all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI ‘this will not kill literally everyone’. Anybody telling you I’m asking for stricter ‘alignment’ than this has failed at reading comprehension. The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors.”
mathematical work like provable alignment will do much
I don’t know what you mean by “do much”, but if you think theory/math work is about “provable alignment” then you’re misunderstanding all (or at least the vast majority?) of the theory/math work on alignment. “Is this system aligned?” is not the sort of property that admits of deductive proof, even if the path to understanding “how does alignment work?” better today routes through some amount of theorem-proving on more abstract and fully-formalizable questions.
I like the distinction between parallelizable and serial research time, and agree that there should be a very high bar for shortening AI timelines and eating up precious serial time.
One caveat to the claim that we should prioritize serial alignment work over parallelizable work, is that this assumes an omniscient and optimal allocator of researcher-hours to problems. Insofar as this assumption doesn’t hold (because our institutions fail, or because the knowledge about how to allocate researcher-hours itself depends on the outcomes of parallelizable research) the distinction between parallelizable and serial work breaks down and other considerations dominate.
Why do you think it assumes that?
Sorry, I didn’t mean to imply that these are logical assumptions necessary for us to prioritize serial work; but rather insofar as these assumptions don’t hold, prioritizing work that looks serial to us is less important at the margin.
Spelling out the assumptions more:
Omniscient meaning “perfect advance knowledge of what work will turn out to be serial vs parallelizable.” In practice I think this is very hard to know beforehand—a lot of work that turned out to be part of the “serial bottleneck” looked parallelizable ex ante.
Optimal meaning “institutions will actually allocate enough researchers to the problem in time for the parallelizable work to get done”. Insofar as we don’t expect this to hold, we will lose even if all the serial work gets done in time.
Also, on a re-read I notice that all the examples given in the post relate to mathematics or theoretical work, which is almost uniquely serial among human activities. By contrast, engineering disciplines are typically much more parallelizable, as evidenced by the speedup in technological progress during war-time.
This isn’t a coincidence; the state of alignment knowledge is currently “we have no idea what would be involved in doing it even in principle, given realistic research paths and constraints”, very far from being a well-specified engineering problem. Cf. https://intelligence.org/2013/11/04/from-philosophy-to-math-to-engineering/.
If you succeed at the framework-inventing “how does one even do this?” stage, then you can probably deploy an enormous amount of engineering talent in parallel to help with implementation, small iterative improvements, building-upon-foundations, targeting-established-metrics, etc. tasks.
I agree if the criterion is to get us to utopia, it’s a problem (maybe not even that, but whatever), but if we instead say that it has to avoid x-risk, then we do have some options. My favorite research directions are IDA and HCH, with ELK a second option for alignment. We aren’t fully finished on those ideas, but we do have at least some idea about what we can do.
Also, it’s very unlikely theoretical work or mathematical work like provable alignment will do much, beyond toy problems here.
This seems to misunderstand my view? My goal is to avoid x-risk, not to get us to utopia. (Or rather, my proximate goal is to end the acute risk period; ultimate goal utopia, but I want to pint nearly all of the utopia-work to the period after we’ve ended the acute risk period.
Cf., from Eliezer:
“When I say that alignment is lethally difficult, I am not talking about ideal or perfect goals of ‘provable’ alignment, nor total alignment of superintelligences on exact human values, nor getting AIs to produce satisfactory arguments about moral dilemmas which sorta-reasonable humans disagree about, nor attaining an absolute certainty of an AI not killing everyone. When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1” is an overly large ask that we are not on course to get. So far as I’m concerned, if you can get a powerful AGI that carries out some pivotal superhuman engineering task, with a less than fifty percent change of killing more than one billion people, I’ll take it. Even smaller chances of killing even fewer people would be a nice luxury, but if you can get as incredibly far as “less than roughly certain to kill everybody”, then you can probably get down to under a 5% chance with only slightly more effort. Practically all of the difficulty is in getting to “less than certainty of killing literally everyone”. Trolley problems are not an interesting subproblem in all of this; if there are any survivors, you solved alignment. At this point, I no longer care how it works, I don’t care how you got there, I am cause-agnostic about whatever methodology you used, all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI ‘this will not kill literally everyone’. Anybody telling you I’m asking for stricter ‘alignment’ than this has failed at reading comprehension. The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors.”
I don’t know what you mean by “do much”, but if you think theory/math work is about “provable alignment” then you’re misunderstanding all (or at least the vast majority?) of the theory/math work on alignment. “Is this system aligned?” is not the sort of property that admits of deductive proof, even if the path to understanding “how does alignment work?” better today routes through some amount of theorem-proving on more abstract and fully-formalizable questions.