TBH my naive thought is that if John’s project succeeds it’ll solve most of what I think of as the hard part of alignment, and so it seems like one of the more promising approaches to me, but in my model of the world it seems quite unlikely that there are natural abstractions in the way that John seems to think there are.
You may also want to check out John Wentworth’s natural abstraction hypothesis work:
https://www.lesswrong.com/posts/cy3BhHrGinZCp3LXE/testing-the-natural-abstraction-hypothesis-project-intro
I have LOL thanks tho
TBH my naive thought is that if John’s project succeeds it’ll solve most of what I think of as the hard part of alignment, and so it seems like one of the more promising approaches to me, but in my model of the world it seems quite unlikely that there are natural abstractions in the way that John seems to think there are.