brambleboy comments on Values Are Real Like Harry Potter

brambleboy 2 Jan 2025 23:56 UTC
5 points
2
Thanks for responding.
I agree with what you’re saying; I think you’d want to maintain your reward stream at least partially. However, the main point I’m trying to make is that in this hypothetical, it seems like you’d no longer be able to think of your reward stream as grounding out your values. Instead it’s the other way around: you’re using your values to dictate the reward stream. This happens in real life sometimes, when we try to make things we value more rewarding.
You’d end up keeping your values, I think, because your beliefs about what you value don’t go away, and your behaviors that put them into practice don’t immediately go away either, and through those your values are maintained (at least somewhat).
If you can still have values without reward signals that tell you about them, then doesn’t that mean your values are defined by more than just what the “screen” shows? That even if you could see and understand every part of someone’s reward system, you still wouldn’t know everything about their values?
- johnswentworth 5 Jan 2025 12:22 UTC
  3 points
  0
  Parent
  If you can still have values without reward signals that tell you about them, then doesn’t that mean your values are defined by more than just what the “screen” shows? That even if you could see and understand every part of someone’s reward system, you still wouldn’t know everything about their values?
  No.
  An analogy: suppose I run a small messaging app, and all the users’ messages are stored in a database. The messages are also cached in a faster-but-less-stable system. One day the database gets wiped for some reason, so I use the cache to repopulate the database.
  In this example, even though I use the cache to repopulate the database in this one weird case, it is still correct to say that the database is generally the source of ground truth for user messages in the system; the weird case is in fact weird. (Indeed, that’s exactly how software engineers would normally talk about it.)
  Spelling out the analogy: in a human brain in ordinary operation, our values (I claim) ground out in the reward stream, analogous to the database. There’s still a bunch of “caching” of values, and in weird cases like the one you suggest, one might “repopulate” the reward stream from the “cached” values elsewhere in the system. But it’s still correct to say that the reward stream is generally the source of ground truth for values in the system; the weird case is in fact weird.