Nate Soares’ point did not depend on complex systems dynamics causing tiny miscalibrations to blow up into massive issues. The entire point of that essay is to show how ontological shifts are a major problem for alignment robustness.
I expect that AIs will be good enough at epistemology to do competent error correction and the problems you seem overly focused on are irrelevant.
Do you believe that all attempts at alignment are flawed and that we should stop building powerful ASIs entirely? I can’t quite get what your belief is.
Thanks, reading the post again, I do see quite a lot of emphasis on ontological shifts:
”Then, the system takes that sharp left turn, and, predictably, the capabilities quickly improve outside of its training distribution, while the alignment falls apart.”
I expect that AIs will be good enough at epistemology to do competent error correction and the problems you seem overly focused on are irrelevant.
How do you know that the degree of error correction possible will be sufficient to have any sound and valid guarantee of long-term AI safety?
Again, people really cannot rely on your personal expectation when it comes to machinery that could lead to the deaths of everyone . I’m looking for specific, well-thought-through arguments.
Do you believe that all attempts at alignment are flawed and that we should stop building powerful ASIs entirely?
Yes, that is the conclusion based on me probing my mentor’s argumentation for 1.5 years, and concluding that the empirical premises are sound and the reasoning logically consistent.
Nate Soares’ point did not depend on complex systems dynamics causing tiny miscalibrations to blow up into massive issues. The entire point of that essay is to show how ontological shifts are a major problem for alignment robustness.
I expect that AIs will be good enough at epistemology to do competent error correction and the problems you seem overly focused on are irrelevant.
Do you believe that all attempts at alignment are flawed and that we should stop building powerful ASIs entirely? I can’t quite get what your belief is.
Thanks, reading the post again, I do see quite a lot of emphasis on ontological shifts:
”Then, the system takes that sharp left turn, and, predictably, the capabilities quickly improve outside of its training distribution, while the alignment falls apart.”
How do you know that the degree of error correction possible will be sufficient to have any sound and valid guarantee of long-term AI safety?
Again, people really cannot rely on your personal expectation when it comes to machinery that could lead to the deaths of everyone
.
I’m looking for specific, well-thought-through arguments.
Yes, that is the conclusion based on me probing my mentor’s argumentation for 1.5 years, and concluding that the empirical premises are sound and the reasoning logically consistent.