I would understand this claim more if you claimed to value something very simple, like diamonds or paperclips (though I wouldn’t believe you that it was what you valued).
I don’t value getting maximum diamonds and paperclips, but I think you’ve correctly identified my crux here in that I think values and value formation are both simpler in in the sense that it requires a lot less of a prior and a lot more can be learned from data, and less fragile than a lot of LWers believe, and this doesn’t just apply to my own values, which could broadly be said to be quite socially liberal and economically centrist.
I think this for several reasons:
I think a lot of people are making an error when they estimate how complicated their values are in the sense relevant for AI alignment, because they add both the complexity of the generative process/algorithms/priors for values and the complexity of the data for value learning, and I think most of the complexity of my own values as well as other people’s values is in very large part (like 90-99%+) the data, and not encoded priors from my genetics.
This is because I think a lot of what evopsych says about how humans got their capabilities and values is basically wrong, and I think one of the more interesting pieces of evidence is that in AI training, there’s a general dictum that the data matter more than the architecture/prior in how AIs will behave, especially OOD generalization, as well as the bitter lesson in DL capabilities.
While this itself is important for why I don’t think that we need to program in a very complicated value/utility function, I also think that there is enough of an analogy between DL and the brain such that you can transport a lot of insights between one field and another, and there are some very interesting papers on the similarity between the human brain and what LLMs are doing, and spoiler alert, they’re not the same thing, but they are doing pretty similar things and I’ll give all links below:
The answer is a bit tricky, but my general answer is that the model-based RL parts of my brain probably are maximizing utility, but that the model-free RL part isn’t doing this for reasons related to reward isn’t the optimization target.
So my answer is about 10-50% close, where there are significant differences, but I do see some similarities between utility maximization and what humans do.
This one is extremely easy to answer:
(you were to freeze me and maximize my preferences at different points in a single day, how much would the resultant universes look like each other vs look extremely different?)
The answer is they look like each other, though there can be real differences, but critically the data and brain do not usually update this fast except in some constrained circumstances, just because data matters more than architecture doesn’t mean the brain updates it’s values this fast.
Okay, I think I’ve found the crux here:
I don’t value getting maximum diamonds and paperclips, but I think you’ve correctly identified my crux here in that I think values and value formation are both simpler in in the sense that it requires a lot less of a prior and a lot more can be learned from data, and less fragile than a lot of LWers believe, and this doesn’t just apply to my own values, which could broadly be said to be quite socially liberal and economically centrist.
I think this for several reasons:
I think a lot of people are making an error when they estimate how complicated their values are in the sense relevant for AI alignment, because they add both the complexity of the generative process/algorithms/priors for values and the complexity of the data for value learning, and I think most of the complexity of my own values as well as other people’s values is in very large part (like 90-99%+) the data, and not encoded priors from my genetics.
This is because I think a lot of what evopsych says about how humans got their capabilities and values is basically wrong, and I think one of the more interesting pieces of evidence is that in AI training, there’s a general dictum that the data matter more than the architecture/prior in how AIs will behave, especially OOD generalization, as well as the bitter lesson in DL capabilities.
While this itself is important for why I don’t think that we need to program in a very complicated value/utility function, I also think that there is enough of an analogy between DL and the brain such that you can transport a lot of insights between one field and another, and there are some very interesting papers on the similarity between the human brain and what LLMs are doing, and spoiler alert, they’re not the same thing, but they are doing pretty similar things and I’ll give all links below:
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003963
https://www.nature.com/articles/s41593-022-01026-4
https://www.biorxiv.org/content/10.1101/2022.03.01.482586v1.full
https://www.nature.com/articles/s42003-022-03036-1
https://arxiv.org/abs/2306.01930
To answer some side questions:
how close to having a utility function am I?
The answer is a bit tricky, but my general answer is that the model-based RL parts of my brain probably are maximizing utility, but that the model-free RL part isn’t doing this for reasons related to reward isn’t the optimization target.
So my answer is about 10-50% close, where there are significant differences, but I do see some similarities between utility maximization and what humans do.
This one is extremely easy to answer:
(you were to freeze me and maximize my preferences at different points in a single day, how much would the resultant universes look like each other vs look extremely different?)
The answer is they look like each other, though there can be real differences, but critically the data and brain do not usually update this fast except in some constrained circumstances, just because data matters more than architecture doesn’t mean the brain updates it’s values this fast.