To get aligned AI, train it on a corpus generated by aligned humans.
Except that we don’t have that, and probably can’t get it.
I’m not sure why all the people who think harder than I do about the field aren’t testing their “how to get alignment” theories on humans first—there seems to be a prevailing assumption that somehow throwing more intellect at an agent will suddenly render it fully rational, for reasons similar to why higher-IQ humans lead universally happier lives (/s).
At this point I’ve basically given up on prodding the busy intellectuals to explain that part, and resorted to taking it on faith that it makes sense to them because they’re smarter than me.
To get aligned AI, train it on a corpus generated by aligned humans.
Except that we don’t have that, and probably can’t get it.
I’m not sure why all the people who think harder than I do about the field aren’t testing their “how to get alignment” theories on humans first—there seems to be a prevailing assumption that somehow throwing more intellect at an agent will suddenly render it fully rational, for reasons similar to why higher-IQ humans lead universally happier lives (/s).
At this point I’ve basically given up on prodding the busy intellectuals to explain that part, and resorted to taking it on faith that it makes sense to them because they’re smarter than me.
Some of us are!
I mean, I don’t know you, so I don’t know if I’ve thought harder about the field than you have.
But FWIW, there’s a lot of us chewing on exactly this, and running experiments of various sizes, and we have some tentative conclusions.
It just tends to drift away from LW in social flavor. A lot of this stuff you’ll find in places LW-type folk tend to label “post-rationalist”.