We might disagree about the value of thinking about “we are all dead” timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where “all humans are dead” isn’t really a relevant category that changes how physics works.)
Of course, I’m not implying it’s useful for alignment to “see that the AI has already eaten the sun”, it’s about forecasting future timelines by defining thresholds and thinking about when they’re likely to happen and how they relate to other things.
(See this post, section “Models of ASI should start with realism”)
Obliqueness obviously leaves open the question of just how oblique. It’s hard to even formulate a quantitative question here. I’d very intuitively and roughly guess that intelligence and values are 3 degrees off (that is, almost diagonal), but it’s unclear what question I am even guessing the answer to. I’ll leave formulating and answering the question as an open problem.
I agree, values and beliefs are oblique. The 3 spatial dimensions are also mutually oblique, as per General Relativity. A theory of obliqueness is meaningless if it cannot specify the angles [ I think in a correct general linear algebra, everything would be treated as [at least potentially] oblique to everything else, but that doesn’t mean I refuse to ever treat the 3 spatial dimensions as mutually orthogonal ].
As with the 3 spatial dimensions in practical ballistics, with the value dimension and the belief dimension in practical AI alignment, there are domains of discussion where it is appropriate to account for the skew between the dimensions and domains where it is appropriate to simply treat them as orthogonal. Discussions of alignment theory such as the ones in which you seek to insert your Obliqueness thesis, are a domain in which the orthogonality assumption is appropriate. We cannot guess at the skew with any confidence in particular cases, and with respect to any particular pre-chosen utility/valence function term versus any particular belief-state [e.g. “what is the dollar value of a tulip?” • “is there a teapot circling Mars?”], the level of skew is almost certain to be negligible.
Planning for anthropically selected futures, on the other hand, is a domain where the skew between values and beliefs becomes relevant. There is less point reasoning in detail about pessimized futures or others that disagree with our values [such as the ones in which we are dead], no matter how likely or unlikely they might be “in a vacuum”, if we’re trying to hyperstition our way into futures we like. But this is an esoteric and controversial argument and not actually required to justify why I don’t think it’s useful to consider [sufficiently] strong AI as “what can eat the sun”.
All that’s required to justify why I don’t think it’s useful to consider [sufficiently] strong AI as “what can eat the sun”, is that what you propose is a benchmark of capability, or intelligence. Benchmarks of intelligence [say, of bureaucrats or chimps] are not questions of fact. They are social fictions chosen for their usefulness. If, in the vast, vast supermajority of the worlds where the benchmark would otherwise be useful—in this case, the worlds where people deploy an AI that they do not know, ahead of time, if it will or will not be strong enough to eat the Sun—it will not be useful for contingent reasons—in this case, because we are all dead—then it is not, particularly, a benchmark we should be etching into the wood, from our present standpoint.
We might disagree about the value of thinking about “we are all dead” timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where “all humans are dead” isn’t really a relevant category that changes how physics works.)
Of course, I’m not implying it’s useful for alignment to “see that the AI has already eaten the sun”, it’s about forecasting future timelines by defining thresholds and thinking about when they’re likely to happen and how they relate to other things.
(See this post, section “Models of ASI should start with realism”)
Your linked post on The Obliqueness Thesis is curious. You conclude thus:
I agree, values and beliefs are oblique. The 3 spatial dimensions are also mutually oblique, as per General Relativity. A theory of obliqueness is meaningless if it cannot specify the angles [ I think in a correct general linear algebra, everything would be treated as [at least potentially] oblique to everything else, but that doesn’t mean I refuse to ever treat the 3 spatial dimensions as mutually orthogonal ].
As with the 3 spatial dimensions in practical ballistics, with the value dimension and the belief dimension in practical AI alignment, there are domains of discussion where it is appropriate to account for the skew between the dimensions and domains where it is appropriate to simply treat them as orthogonal. Discussions of alignment theory such as the ones in which you seek to insert your Obliqueness thesis, are a domain in which the orthogonality assumption is appropriate. We cannot guess at the skew with any confidence in particular cases, and with respect to any particular pre-chosen utility/valence function term versus any particular belief-state [e.g. “what is the dollar value of a tulip?” • “is there a teapot circling Mars?”], the level of skew is almost certain to be negligible.
Planning for anthropically selected futures, on the other hand, is a domain where the skew between values and beliefs becomes relevant. There is less point reasoning in detail about pessimized futures or others that disagree with our values [such as the ones in which we are dead], no matter how likely or unlikely they might be “in a vacuum”, if we’re trying to hyperstition our way into futures we like. But this is an esoteric and controversial argument and not actually required to justify why I don’t think it’s useful to consider [sufficiently] strong AI as “what can eat the sun”.
All that’s required to justify why I don’t think it’s useful to consider [sufficiently] strong AI as “what can eat the sun”, is that what you propose is a benchmark of capability, or intelligence. Benchmarks of intelligence [say, of bureaucrats or chimps] are not questions of fact. They are social fictions chosen for their usefulness. If, in the vast, vast supermajority of the worlds where the benchmark would otherwise be useful—in this case, the worlds where people deploy an AI that they do not know, ahead of time, if it will or will not be strong enough to eat the Sun—it will not be useful for contingent reasons—in this case, because we are all dead—then it is not, particularly, a benchmark we should be etching into the wood, from our present standpoint.