I was thinking more that the question here was also about more rigorous and less qualitative papers supporting the thesis, than just explanations for laypeople. One of the most common arguments against AI safety is that it’s unscientific because it doesn’t have rigorous theoretical support. I’m not super satisfied with that criticism (I feel like the general outlines are clear enough, and I don’t think you can really make up some quantitative framework to predict, e.g., which fraction of goals in the total possible goal-space benefit from power-seeking and self-preservation, so in the end you still have to go with the qualitative argument and your feel for how much does it apply to reality), but I think if it has to be allayed, it should be by something that targets specific links in the causal chain of Doom. Important side bonus, formalizing and investigating these problems might actually reveal interesting potential alignment ideas.
I’ll have to read those papers you linked, but to me in general it feels like perhaps the topic more amenable to this sort of treatment is indeed Instrumental Convergence. The Orthogonality Thesis feels to me more of a philosophical statement, and indeed we’ve had someone arguing for moral realism here just days ago. I don’t think you can really prove it or not from where we are. But I think if you phrased it as “being smart does not make you automatically good” you’d find that most people agree with you—especially people of the persuasion that right now regards AI safety and TESCREAL people as they dubbed us with most suspicion. Orthogonality is essentially moral relativism!
Now if we’re talking about a more outreach-oriented discussion, then I think all concepts can be explained pretty clearly. I’d also recommend using analogies to e.g. invasive species in new habitats, or the evils of colonialism, to stress why and how it’s both dangerous and unethical to unleash things that are more capable than us and are driven by too simple and greedy a goal on the world; insist on the fact that what makes us special is the richness and complexity of our values, and that our highest values are the ones that most prevent us from simply going on a power seeing rampage. That makes the notion of the first AGI being dangerous pretty clear: if you focus only on making them smart but you slack off on making them good, the latter part will be pretty rudimentary, and so you’re creating something that is like a colony of intelligent bacteria.
I was thinking more that the question here was also about more rigorous and less qualitative papers supporting the thesis, than just explanations for laypeople. One of the most common arguments against AI safety is that it’s unscientific because it doesn’t have rigorous theoretical support. I’m not super satisfied with that criticism (I feel like the general outlines are clear enough, and I don’t think you can really make up some quantitative framework to predict, e.g., which fraction of goals in the total possible goal-space benefit from power-seeking and self-preservation, so in the end you still have to go with the qualitative argument and your feel for how much does it apply to reality), but I think if it has to be allayed, it should be by something that targets specific links in the causal chain of Doom. Important side bonus, formalizing and investigating these problems might actually reveal interesting potential alignment ideas.
I’ll have to read those papers you linked, but to me in general it feels like perhaps the topic more amenable to this sort of treatment is indeed Instrumental Convergence. The Orthogonality Thesis feels to me more of a philosophical statement, and indeed we’ve had someone arguing for moral realism here just days ago. I don’t think you can really prove it or not from where we are. But I think if you phrased it as “being smart does not make you automatically good” you’d find that most people agree with you—especially people of the persuasion that right now regards AI safety and TESCREAL people as they dubbed us with most suspicion. Orthogonality is essentially moral relativism!
Now if we’re talking about a more outreach-oriented discussion, then I think all concepts can be explained pretty clearly. I’d also recommend using analogies to e.g. invasive species in new habitats, or the evils of colonialism, to stress why and how it’s both dangerous and unethical to unleash things that are more capable than us and are driven by too simple and greedy a goal on the world; insist on the fact that what makes us special is the richness and complexity of our values, and that our highest values are the ones that most prevent us from simply going on a power seeing rampage. That makes the notion of the first AGI being dangerous pretty clear: if you focus only on making them smart but you slack off on making them good, the latter part will be pretty rudimentary, and so you’re creating something that is like a colony of intelligent bacteria.