Algon comments on A central AI alignment problem: capabilities generalization, and the sharp left turn

Algon 16 Jun 2022 21:08 UTC
5 points
0
Do you feel your agenda will allow us to formalise the idea of “don’t hack the agent who provides your reward signal” in some way? Every attempt I’ve seen has either failed or been too restrictive.
- Vanessa Kosoy 17 Jun 2022 8:16 UTC
  10 points
  0
  Parent
  The AI only considers the user’s timeline before the AI’s creation as specifying the loss function. It cannot change the past^[1].
  ↩︎
  Unless time travel is possible. I haven’t thought through the implications of time travel, but it seems sufficiently unlikely that handling that scenario is a “luxury”.