However, what I’m talking/ranting about above is the concept of human well-being/values, which as I said I think is a natural abstraction. But rereading your comment, I think you were actually talking about a mathematical True Name of Human Values, by which I imagine you mean an incredibly long list of useful facts like “humans tend to prefer indoor temperatures of around 70–74 degrees Fahrenheit — roughly the average climate of the Great Rift Valley in Africa where they are thought to have evolved”, or something that that fact could be extracted from. (Technically, that fact is contained in our genome, but not in a very easily legible/extractable form.) If something like that is what you meant, then yes, I agree that it’s not a single natural abstraction, and also that mathematics seems like a bad format for it. I also think that any LLM whose training set includes a vast number of tokens of our output, as the one I’m proposing would, actually has encyclopedic knowledge of these sorts of trivia facts about what humans like: we write a lot about this stuff. All of which would put this into a different category of “thing I think some people on Less Wrong worry too much about, for LLMs”. LLMs kniow us very well, so if they care about our well-being, as I’m trying to ensure, then I expect them to be able to do a good job of knowing all the things that entails. So my claim would be that LLMs know Human Values well (I believe I covered that briefly in the first section of my post.)
However, what I’m talking/ranting about above is the concept of human well-being/values, which as I said I think is a natural abstraction. But rereading your comment, I think you were actually talking about a mathematical True Name of Human Values, by which I imagine you mean an incredibly long list of useful facts like “humans tend to prefer indoor temperatures of around 70–74 degrees Fahrenheit — roughly the average climate of the Great Rift Valley in Africa where they are thought to have evolved”, or something that that fact could be extracted from. (Technically, that fact is contained in our genome, but not in a very easily legible/extractable form.) If something like that is what you meant, then yes, I agree that it’s not a single natural abstraction, and also that mathematics seems like a bad format for it. I also think that any LLM whose training set includes a vast number of tokens of our output, as the one I’m proposing would, actually has encyclopedic knowledge of these sorts of trivia facts about what humans like: we write a lot about this stuff. All of which would put this into a different category of “thing I think some people on Less Wrong worry too much about, for LLMs”. LLMs kniow us very well, so if they care about our well-being, as I’m trying to ensure, then I expect them to be able to do a good job of knowing all the things that entails. So my claim would be that LLMs know Human Values well (I believe I covered that briefly in the first section of my post.)