Very nice! Strong vote for a sequence. Understanding deltas between experts is a good way to both understand their thinking, and to identify areas of uncertainty that need more work/thought.
On natural abstractions, I think the hypothesis is more true for some abstractions than others. I’d think there’s a pretty clear natural abstraction for a set of carbon atoms arranged as diamond. But much less of a clear natural abstraction for the concept of a human. Different people mean different things by “human”, and will do this even more when we can make variations on humans. And almost nobody is sure precisely what they themselves mean by “human”. I would think there’s no really natural abstraction capturing humans.
This seems pretty relevant since alignment is probably looking for a natural abstraction for something like “human flourishing”. “I’d think there’s a natural abstraction for “thinking beings”, on a spectrum of how much they are thinking beings, but not for humans specifically.
This just complicates the question of whether natural abstractions exist and are adequate to align AGIs, but I’m afraid it’s probably the case.
Very nice! Strong vote for a sequence. Understanding deltas between experts is a good way to both understand their thinking, and to identify areas of uncertainty that need more work/thought.
On natural abstractions, I think the hypothesis is more true for some abstractions than others. I’d think there’s a pretty clear natural abstraction for a set of carbon atoms arranged as diamond. But much less of a clear natural abstraction for the concept of a human. Different people mean different things by “human”, and will do this even more when we can make variations on humans. And almost nobody is sure precisely what they themselves mean by “human”. I would think there’s no really natural abstraction capturing humans.
This seems pretty relevant since alignment is probably looking for a natural abstraction for something like “human flourishing”. “I’d think there’s a natural abstraction for “thinking beings”, on a spectrum of how much they are thinking beings, but not for humans specifically.
This just complicates the question of whether natural abstractions exist and are adequate to align AGIs, but I’m afraid it’s probably the case.