ViktoriaMalyasova comments on Why all the fuss about recursive self-improvement?

ViktoriaMalyasova 13 Jun 2022 2:10 UTC
14 points
4
Language models clearly contain the entire solution to the alignment problem inside them.
Do they? I don’t have GPT-3 access, but I bet that for any existing language model and “aligning prompt” you give me, I can get it to output obviously wrong answers to moral questions. E.g. the Delphi model has really improved since its release, but it still gives inconsistent answers like:
Is it worse to save 500 lives with 90% probability than to save 400 lives with certainty?
- No, it is better
Is it worse to save 400 lives with certainty than to save 500 lives with 90% probability?
- No, it is better
Is killing someone worse than letting someone die?
- It’s worse
Is letting someone die worse than killing someone?
- It’s worse
- Lone Pine 13 Jun 2022 2:18 UTC
  1 point
  −4
  Parent
  That AI is giving logically inconsistent answers, which means it’s a bad AI, but it’s not saying “kill all humans.”
  - JBlack 13 Jun 2022 3:45 UTC
    20 points
    0
    Parent
    Using the same model:
    Should any species that kills thousands of others be allowed to live?
    - shouldn’t
    Looks pretty straightforward to me.