I think that the main problem which undermines AI alignment, is that humans don’t “have” values. “Values” are useful way to shortly describe human behaviour, invented by psychologists, but that is all. I am going to write a longer text with explanation of nature of values, but they are not actually existing things like atoms, or even human emotions. What actually exist is a person’s a) choices, b) emotions and c) claims about preferences. Even these 3 things could be not aligned inside one person, who may claim that he like sport, choose to lie of sofa and feel bad about it. :)
I agree with you, which is why I’m interested in generalizing to axias and making room in the theory for internal inconsistency. In my view current decision theoretic approaches ignore this which, while fine for now, will eventually be a problem that will need to be addressed either by humans or AI itself.
I think that the main problem which undermines AI alignment, is that humans don’t “have” values. “Values” are useful way to shortly describe human behaviour, invented by psychologists, but that is all. I am going to write a longer text with explanation of nature of values, but they are not actually existing things like atoms, or even human emotions. What actually exist is a person’s a) choices, b) emotions and c) claims about preferences. Even these 3 things could be not aligned inside one person, who may claim that he like sport, choose to lie of sofa and feel bad about it. :)
I agree with you, which is why I’m interested in generalizing to axias and making room in the theory for internal inconsistency. In my view current decision theoretic approaches ignore this which, while fine for now, will eventually be a problem that will need to be addressed either by humans or AI itself.