Yeah, this is starting to make a lot more sense to me. It seems that evaluating the complexity of a utility function using Kolmogorov complexity rather than thinking about how hard it is for the AGI to implement it in terms of its internal concept language is a huge mistake. Magical categories don’t seem that magical anymore; simply predicting the next tokens is enough to give you robust abstractions about human values.
Yeah, this is starting to make a lot more sense to me. It seems that evaluating the complexity of a utility function using Kolmogorov complexity rather than thinking about how hard it is for the AGI to implement it in terms of its internal concept language is a huge mistake. Magical categories don’t seem that magical anymore; simply predicting the next tokens is enough to give you robust abstractions about human values.