The spread of Tegmark Level IV, UDT, and related ideas may be an example of “lock-in” that has already happened (to varying degrees) within the rationalist, EA, and AI safety research communities, and could possibly happen to the wider AI research community. (It seems easy to imagine an alternate timeline in which these ideas never spread beyond a few obscure papers and blog posts, or do spread somewhat but are considered outlandish by most people.)
Huh, that’s a good point. Whereas it seems probably inevitable that AI research would’ve eventually converged on something similar to the current D(R)L paradigm, we can imagine a lot of different ways AI safety could have looked like instead right now. Which makes sense, since the latter is still young and in a kind of pre-paradigmatic philosophical stage, with little unambiguous feedback to dictate how things should unfold (and it’s far from clear when substantially more of this feedback will show up).
I can imagine an alternate timeline where the initial core ideas/impetus for AI safety didn’t come from Yudkowsky/LW, but from e.g. a) Bostrom/FHI b) Stuart Russell or c) some near-term ML safety researchers whose thinking gradually evolved as they thought about longer and longer timescales. And it’s interesting to ask what the current field would consequently look like:
Agent Foundations/Embedded Agency probably (?) wouldn’t be a thing, or at least it would might take some time for the underlying questions which motivate it to be asked in writing, let alone the actual questions within those agendas (or something close to them)
For (c) primarily, its unclear if the alignment problem would’ve been zeroed in on as the “central challenge”, or how long this would take (note: I don’t actually know that much about near-term concerns, but I can imagine things like verification, adversarial examples, and algorithmic fairness lingering around on center stage for a while).
A lot of the focus on utility functions probably wouldn’t be there
And none of that is to say anything about those alternate timelines is better, but is to say that a lot of the things I often associate with AI safety are only contingently related. This is probably obvious to a lot of people on here, and of course we have seen some of the Yudkowskian foundational framings of the problem have been de-emphasized as non-LW people have joined the field.
On the other hand, as far as “lock-in” itself is concerned, it does seem like there’s a certain amount of deference that EA has given MIRI/LW on some of the more abstruse matters where would-be critics don’t want to sound stupid for lack of technical sophistication—UDT, Solomonoff, and similar stuff internal to agent foundations—and the longer any idea lingers around, and the farther it spreads, the harder it is to root out if we ever do find good reasons to overturn it. Although I’m not that worried about this, since those ideas are by definition only fully understood/debated by a small part of the community.
Also, it’s my impression that most EAs believe in one-boxing, but not necessarily UDT. For instance, some apparently prefer EDT-like theories, which makes me think the relatively simple arguments for one-boxing have percolated pretty widely (and are probably locked in), but the more advanced details are still largely up for debate. I think similar things can be said for a lot of other things, e.g. “thinking probabilistically” is locked in but maybe not a lot of the more complicated aspects of Bayesian epistemology that have come out of LW.
The spread of Tegmark Level IV, UDT, and related ideas may be an example of “lock-in” that has already happened (to varying degrees) within the rationalist, EA, and AI safety research communities, and could possibly happen to the wider AI research community. (It seems easy to imagine an alternate timeline in which these ideas never spread beyond a few obscure papers and blog posts, or do spread somewhat but are considered outlandish by most people.)
Huh, that’s a good point. Whereas it seems probably inevitable that AI research would’ve eventually converged on something similar to the current D(R)L paradigm, we can imagine a lot of different ways AI safety could have looked like instead right now. Which makes sense, since the latter is still young and in a kind of pre-paradigmatic philosophical stage, with little unambiguous feedback to dictate how things should unfold (and it’s far from clear when substantially more of this feedback will show up).
I can imagine an alternate timeline where the initial core ideas/impetus for AI safety didn’t come from Yudkowsky/LW, but from e.g. a) Bostrom/FHI b) Stuart Russell or c) some near-term ML safety researchers whose thinking gradually evolved as they thought about longer and longer timescales. And it’s interesting to ask what the current field would consequently look like:
Agent Foundations/Embedded Agency probably (?) wouldn’t be a thing, or at least it would might take some time for the underlying questions which motivate it to be asked in writing, let alone the actual questions within those agendas (or something close to them)
For (c) primarily, its unclear if the alignment problem would’ve been zeroed in on as the “central challenge”, or how long this would take (note: I don’t actually know that much about near-term concerns, but I can imagine things like verification, adversarial examples, and algorithmic fairness lingering around on center stage for a while).
A lot of the focus on utility functions probably wouldn’t be there
And none of that is to say anything about those alternate timelines is better, but is to say that a lot of the things I often associate with AI safety are only contingently related. This is probably obvious to a lot of people on here, and of course we have seen some of the Yudkowskian foundational framings of the problem have been de-emphasized as non-LW people have joined the field.
On the other hand, as far as “lock-in” itself is concerned, it does seem like there’s a certain amount of deference that EA has given MIRI/LW on some of the more abstruse matters where would-be critics don’t want to sound stupid for lack of technical sophistication—UDT, Solomonoff, and similar stuff internal to agent foundations—and the longer any idea lingers around, and the farther it spreads, the harder it is to root out if we ever do find good reasons to overturn it. Although I’m not that worried about this, since those ideas are by definition only fully understood/debated by a small part of the community.
Also, it’s my impression that most EAs believe in one-boxing, but not necessarily UDT. For instance, some apparently prefer EDT-like theories, which makes me think the relatively simple arguments for one-boxing have percolated pretty widely (and are probably locked in), but the more advanced details are still largely up for debate. I think similar things can be said for a lot of other things, e.g. “thinking probabilistically” is locked in but maybe not a lot of the more complicated aspects of Bayesian epistemology that have come out of LW.