ozb comments on Half-baked alignment idea

ozb 29 Mar 2023 18:41 UTC
1 point
0
I think a key danger here is that treatment of other agents wouldn’t transfer to humans, both because it’s inherently different and because humans themselves are likely to be on the belligerent side of the spectrum. But even so I think it’s a good start in defining an alignment function that doesn’t require explicitly encoding some particular form of human values.
- ozb 29 Mar 2023 18:48 UTC
  1 point
  0
  Parent
  To extend the approach to address this, I think we’d have to explicitly convey a message of the form “do not discriminate based on superficial traits, only choices”; eg, in addition to behavioral patterns, agents possess superficial traits that are visible to other agents, and are randomly assigned with no particular correlation with the behaviors.
  - ozb 29 Mar 2023 18:50 UTC
    1 point
    0
    Parent
    Better yet, have the agents experience discrimination themselves to internalize the message that it is bad