This is great. One question it raises for me is: Why is there a common assumption in AI safety that values are a sort of existent (i.e., they exist) and latent (i.e., they are not directly observable) phenomena? I don’t think those are unreasonable partial definitions of “values,” but they’re far from the only ones and not at all obvious that they’re the values with which we want to align AI. Philosophers Iason Gabriel (2020) and Patrick Butlin (2021) have pointed out some of the many definitions of “values” that we could use for AI safety.
I understand that just picking an operationalization and sticking to it may be necessary for some technical research, but I worry that the gloss reifies these particular criteria and may even reify semantic issues (e.g., Which latent phenomena do we want to describe as “values”?; a sort of verbal dispute a la Chalmers) incorrectly as substantive issues (e.g., How do we align an AI with the true values?).
This is great. One question it raises for me is: Why is there a common assumption in AI safety that values are a sort of existent (i.e., they exist) and latent (i.e., they are not directly observable) phenomena? I don’t think those are unreasonable partial definitions of “values,” but they’re far from the only ones and not at all obvious that they’re the values with which we want to align AI. Philosophers Iason Gabriel (2020) and Patrick Butlin (2021) have pointed out some of the many definitions of “values” that we could use for AI safety.
I understand that just picking an operationalization and sticking to it may be necessary for some technical research, but I worry that the gloss reifies these particular criteria and may even reify semantic issues (e.g., Which latent phenomena do we want to describe as “values”?; a sort of verbal dispute a la Chalmers) incorrectly as substantive issues (e.g., How do we align an AI with the true values?).