As a result, I think that conceptual alignment will be a required direction of work towards ensuring that the advent of AGI results in a desirable future for humanity among other sapient life. In particular, my perspective as a mathematician leads me to believe that just as a lack of provable guarantees about a mathematical object means that such an object can be arbitrarily unexpectedly badly behaved on features you didn’t want to specify, so too could the behavior of an underspecified or imprecisely specified AGI result in arbitrarily undesirable (or even merely pathological or self-defeating) behavior along axes we didn’t think to check.
IDG how this is supposed to be related to whether scaling will work. Surely if scaling were enough, your arguments here would still go thru, right?
I realized I wasn’t super clear about which part was which. I agree that “is scaling enough” is a major crux for me and I’d be way way more afraid if it looked like scaling were sufficient on its own; that part, however, is about “do we actually need to get alignment basically exactly right”. Does that change your understanding?
IDG how this is supposed to be related to whether scaling will work. Surely if scaling were enough, your arguments here would still go thru, right?
I realized I wasn’t super clear about which part was which. I agree that “is scaling enough” is a major crux for me and I’d be way way more afraid if it looked like scaling were sufficient on its own; that part, however, is about “do we actually need to get alignment basically exactly right”. Does that change your understanding?