Actually, that is switching to reasoning about something else.
Reasoning that the alternative (humans interacting with each other) would lead to reliably worse outcomes is not the same as reasoning about why AGI stay aligned in its effects on the world to stay safe to humans.
And with that switch, you are not addressing Nate Soares’ point that “capabilities generalize better than alignment”.
Nate Soares’ point did not depend on complex systems dynamics causing tiny miscalibrations to blow up into massive issues. The entire point of that essay is to show how ontological shifts are a major problem for alignment robustness.
I expect that AIs will be good enough at epistemology to do competent error correction and the problems you seem overly focused on are irrelevant.
Do you believe that all attempts at alignment are flawed and that we should stop building powerful ASIs entirely? I can’t quite get what your belief is.
Thanks, reading the post again, I do see quite a lot of emphasis on ontological shifts:
”Then, the system takes that sharp left turn, and, predictably, the capabilities quickly improve outside of its training distribution, while the alignment falls apart.”
I expect that AIs will be good enough at epistemology to do competent error correction and the problems you seem overly focused on are irrelevant.
How do you know that the degree of error correction possible will be sufficient to have any sound and valid guarantee of long-term AI safety?
Again, people really cannot rely on your personal expectation when it comes to machinery that could lead to the deaths of everyone . I’m looking for specific, well-thought-through arguments.
Do you believe that all attempts at alignment are flawed and that we should stop building powerful ASIs entirely?
Yes, that is the conclusion based on me probing my mentor’s argumentation for 1.5 years, and concluding that the empirical premises are sound and the reasoning logically consistent.
I stated it in the comment you replied to:
Actually, that is switching to reasoning about something else.
Reasoning that the alternative (humans interacting with each other) would lead to reliably worse outcomes is not the same as reasoning about why AGI stay aligned in its effects on the world to stay safe to humans.
And with that switch, you are not addressing Nate Soares’ point that “capabilities generalize better than alignment”.
Nate Soares’ point did not depend on complex systems dynamics causing tiny miscalibrations to blow up into massive issues. The entire point of that essay is to show how ontological shifts are a major problem for alignment robustness.
I expect that AIs will be good enough at epistemology to do competent error correction and the problems you seem overly focused on are irrelevant.
Do you believe that all attempts at alignment are flawed and that we should stop building powerful ASIs entirely? I can’t quite get what your belief is.
Thanks, reading the post again, I do see quite a lot of emphasis on ontological shifts:
”Then, the system takes that sharp left turn, and, predictably, the capabilities quickly improve outside of its training distribution, while the alignment falls apart.”
How do you know that the degree of error correction possible will be sufficient to have any sound and valid guarantee of long-term AI safety?
Again, people really cannot rely on your personal expectation when it comes to machinery that could lead to the deaths of everyone
.
I’m looking for specific, well-thought-through arguments.
Yes, that is the conclusion based on me probing my mentor’s argumentation for 1.5 years, and concluding that the empirical premises are sound and the reasoning logically consistent.