quick analogy: if the sum of a bunch of numbers is large, there doesn’t need to be any individual number that is large; similarly, if the consequences of a sequence of actions results in a large change, no individual action needs to be “pivotal”
This feels like a pretty central cruxy point—and not just for the relevance of the pivotal act framing specifically. I think it’s underlying a whole difference of worldview or problem-solving approach.
A couple other points in a similar direction:
A thing I noticed in our discussion on the model delta with Christiano post: your criterion for useful AI safety work seems to be roughly “this will contribute some marginal value” as opposed to “this will address a bottleneck”.
Right at the top of this thread, you say: “I think Redwood Research’s recent work on AI control really ‘hits it out of the park’, and they have identified a tractable and neglected intervention that can make AI go a lot better”. Note what’s conspicuously missing there: tractable and neglected, but you don’t claim importance.
I would say that your mindset, when approaching AI safety, seems to be an epsilon fallacy.
Sure, in principle a sum of numbers can be large without any individual number being large. In practice, the 80⁄20 rule is a thing, and everything has bottlenecks all the time. If work is not addressing a bottleneck, then it’s approximately useless.
(Somewhat more precisely: if marginal work is not addressing something which is a bottleneck on current margins, then it’s approximately useless.)
Of importance, tractability and neglectedness, importance is the most important. In practice, it is usually better to have a thousand people trying to solve a major bottleneck each with low chance of success, than a thousand people making tractable progress on some neglected issue which is not a bottleneck.
I think I disagree with your model of importance. If your goal is the make a sum of numbers small, then you want to focus your efforts where the derivative is lowest (highest? signs are hard), not where the absolute magnitude is highest.
The “epsilon fallacy” can be committed in both directions: both in that any negative dervative is worth working on, and that any extremely large number is worth taking a chance to try to improve.
I also seperately think that “bottleneck” is not generally a good term to apply to a complex project with high amounts of technical and philosophical uncertainty. The ability to see a “bottleneck” is very valuable should one exist, but I am skeptical of the ability to strongly predict where such bottlnecks will be in advance, and do not think the historical record really supports the ability to find such bottlenecks reliably by “thinking”, as opposed to doing a lot of stuff, including trying things and seeing what works. If you have a broad distribution over where a bottleneck might be, then all activities lend value by “derisking” locations for particular bottlenecks if they succeed, and providing more evidence that a bottleneck is in a particular location if it fails. (kinda like: https://en.wikipedia.org/wiki/Swiss_cheese_model) For instance, I think of “deceptive alignment” as a possible way to get pessimal generalization, and thus a proabalistic “bottleneck” to various alignment approaches. But there are other ways things can fail, and so one can still lend value by solving non-deceptive-alignment related problems (although my day job consists of trying to get “benign generalization” our of ML, and thus does infact address that particular bottleneck imo).
I also seperately think that if someone thinks they have identified a bottleneck, they should try to go resolve it as best they can. I think of that as what you (John) is doing, and fully support such activities, although think I am unlikely to join your particular project. I think the questions you are trying to answer are very interesting ones, and the “natural latents” approach seems likely to shed at some light on whats going on with e.g. the ability of agents to communicate at all.
I do think that “we don’t have enough information to know where the bottlenecks are yet” is in-general a reasonable counterargument to a “just focus on the bottlenecks” approach (insofar as we in fact do not yet have enough information). In this case I think we do have enough information, so that’s perhaps a deeper crux.
Hiliariously, it seems likely that our disagreement is even more meta, on the question of “how do you know when you have enough information to know”, or potentially even higher, e.g. “how much uncertainty should one have given that they think they know” etc.
I agree it is better work on bottlenecks than non-bottlenecks. I have high uncertainty about where such bottlenecks will be, and I think sufficiently low amounts of work have gone into “control” that it’s obviously worth investing more, because e.g. I think it’ll let us get more data on where bottlenecks are.
This feels like a pretty central cruxy point—and not just for the relevance of the pivotal act framing specifically. I think it’s underlying a whole difference of worldview or problem-solving approach.
A couple other points in a similar direction:
A thing I noticed in our discussion on the model delta with Christiano post: your criterion for useful AI safety work seems to be roughly “this will contribute some marginal value” as opposed to “this will address a bottleneck”.
Right at the top of this thread, you say: “I think Redwood Research’s recent work on AI control really ‘hits it out of the park’, and they have identified a tractable and neglected intervention that can make AI go a lot better”. Note what’s conspicuously missing there: tractable and neglected, but you don’t claim importance.
I would say that your mindset, when approaching AI safety, seems to be an epsilon fallacy.
Sure, in principle a sum of numbers can be large without any individual number being large. In practice, the 80⁄20 rule is a thing, and everything has bottlenecks all the time. If work is not addressing a bottleneck, then it’s approximately useless.
(Somewhat more precisely: if marginal work is not addressing something which is a bottleneck on current margins, then it’s approximately useless.)
Of importance, tractability and neglectedness, importance is the most important. In practice, it is usually better to have a thousand people trying to solve a major bottleneck each with low chance of success, than a thousand people making tractable progress on some neglected issue which is not a bottleneck.
I think I disagree with your model of importance. If your goal is the make a sum of numbers small, then you want to focus your efforts where the derivative is lowest (highest? signs are hard), not where the absolute magnitude is highest.
The “epsilon fallacy” can be committed in both directions: both in that any negative dervative is worth working on, and that any extremely large number is worth taking a chance to try to improve.
I also seperately think that “bottleneck” is not generally a good term to apply to a complex project with high amounts of technical and philosophical uncertainty. The ability to see a “bottleneck” is very valuable should one exist, but I am skeptical of the ability to strongly predict where such bottlnecks will be in advance, and do not think the historical record really supports the ability to find such bottlenecks reliably by “thinking”, as opposed to doing a lot of stuff, including trying things and seeing what works. If you have a broad distribution over where a bottleneck might be, then all activities lend value by “derisking” locations for particular bottlenecks if they succeed, and providing more evidence that a bottleneck is in a particular location if it fails. (kinda like: https://en.wikipedia.org/wiki/Swiss_cheese_model) For instance, I think of “deceptive alignment” as a possible way to get pessimal generalization, and thus a proabalistic “bottleneck” to various alignment approaches. But there are other ways things can fail, and so one can still lend value by solving non-deceptive-alignment related problems (although my day job consists of trying to get “benign generalization” our of ML, and thus does infact address that particular bottleneck imo).
I also seperately think that if someone thinks they have identified a bottleneck, they should try to go resolve it as best they can. I think of that as what you (John) is doing, and fully support such activities, although think I am unlikely to join your particular project. I think the questions you are trying to answer are very interesting ones, and the “natural latents” approach seems likely to shed at some light on whats going on with e.g. the ability of agents to communicate at all.
I do think that “we don’t have enough information to know where the bottlenecks are yet” is in-general a reasonable counterargument to a “just focus on the bottlenecks” approach (insofar as we in fact do not yet have enough information). In this case I think we do have enough information, so that’s perhaps a deeper crux.
Hiliariously, it seems likely that our disagreement is even more meta, on the question of “how do you know when you have enough information to know”, or potentially even higher, e.g. “how much uncertainty should one have given that they think they know” etc.
I agree it is better work on bottlenecks than non-bottlenecks. I have high uncertainty about where such bottlenecks will be, and I think sufficiently low amounts of work have gone into “control” that it’s obviously worth investing more, because e.g. I think it’ll let us get more data on where bottlenecks are.
see my longer comment https://www.lesswrong.com/posts/A79wykDjr4pcYy9K7/mark-xu-s-shortform#8qjN3Mb8xmJxx59ZG