TAG comments on Evaluating the historical value misspecification argument

TAG 28 Nov 2023 18:35 UTC
2 points
0

The point of “the genie knows but doesn’t care” wasn’t that the AI would take your instructions, know what you want, and yet disobey the instructions because it doesn’t care about what you asked for. If you read Rob Bensinger’s essay carefully, you’ll find that he’s actually warning that the AI will care too much about the utility function you gave it, and maximize it exactly, against your intentions

If so, the title was pretty misleading.

And if that is the case, it still isn’t making much of a point: it assumes a hand-coded UF, so it isn’t applicable to LLMs , or many other architectures. So it doesn’t support conclusions like “the first true AI will kill us all with high probability”. The “doesn’t” should be a “might not” as well.

We’re still arguing about the meaning of Genie Knows because it was always unclear. It was always unclear l, I think, because it was a Motte and Bailey exercise, trying to come to the conclusion that an AI is highly likely to literastically misunderstand human values using an argument that only suggested it was possible.