Evan R. Murphy comments on Half-baked AI Safety ideas thread

Evan R. Murphy 17 Jul 2022 8:19 UTC
1 point
What would be the reward you’re training the AI on with this dataset? If you’re not careful you could inadvertently train a learned optimizer, e.g. a “hugging humans maximizer” to take a silly example.

That may sound nice but could have torturous results, e.g. the AI forcing humans to hug, or replacing biological humans with server farms housing simulations of quadrillions of humans hugging.
- soth02 22 Jul 2022 23:55 UTC
  1 point
  Parent
  Does there have to be a reward? This is using brute force to create the underlying world model. It’s just adjusting weights right?
  - Evan R. Murphy 23 Jul 2022 0:47 UTC
    1 point
    Parent
    I think there has to be some kind of reward or loss function, in the current paradigm anyway. That’s what gradient descent uses to know such weights to adjust on each update.
    
    Like what are you imagining is the input output channel of this AI? Maybe discussing this a bit would help us clarify.
    - Hastings 23 Jul 2022 1:49 UTC
      2 points
      Parent
      To steelman, I’d guess this idea applies in the hypothetical where GPT-N gains general intelligence and agency (such as via a mesa-optimizer) just by predicting the next token.