I expect you’ll say I’m missing something, but to me, this sounds like a language dispute. My understanding of your recent thinking holds that the important goal is to understand how human learning reliably results in human values. The Bayesian perspective on this is “figuring out the human prior”, because a prior is just a way-to-learn. You might object to the overly Bayesian framing of that; but I’m fine with that. I am not dogmatic on orthodox bayesianism. I do not even like utility functions.
Insofar as the question makes sense, its answer probably takes the form of inductive biases: I might learn to predict the world via self-supervised learning and form concepts around other people having values and emotional states due to that being a simple convergent abstraction relatively pinned down by my training process, architecture, and data over my life, also reusing my self-modelling abstractions.
I am totally fine with saying “inductive biases” instead of “prior”; I think it indeed pins down what I meant in a more accurate way (by virtue of, in itself, being a more vague and imprecise concept than “prior”).
I expect you’ll say I’m missing something, but to me, this sounds like a language dispute. My understanding of your recent thinking holds that the important goal is to understand how human learning reliably results in human values. The Bayesian perspective on this is “figuring out the human prior”, because a prior is just a way-to-learn. You might object to the overly Bayesian framing of that; but I’m fine with that. I am not dogmatic on orthodox bayesianism. I do not even like utility functions.
I am totally fine with saying “inductive biases” instead of “prior”; I think it indeed pins down what I meant in a more accurate way (by virtue of, in itself, being a more vague and imprecise concept than “prior”).
I agree, this does seem like it was a language dispute, I no longer perceive us as disagreeing on this point.