Dave Orr answers What actual bad outcome has “ethics-based” RLHF AI Alignment already prevented?

Dave Orr 19 Oct 2024 16:08 UTC
6 points
0
You may recall certain news items last February around Gemini and diversity that wiped many billions off of Google’s market cap.

There’s a clear financial incentive to make sure that models say things within expected limits.

There’s also this: https://www.wired.com/story/air-canada-chatbot-refund-policy/
- Roko 19 Oct 2024 17:19 UTC
  3 points
  1
  Parent
  This is not to do with ethics though?
  
  Air Canada Has to Honor a Refund Policy Its Chatbot Made Up
  
  This is just the model hallucinating?
  - Linch 20 Oct 2024 0:32 UTC
    2 points
    1
    Parent
    They were likely using inferior techniques to RLHF to implement ~Google corporate standards; not sure what you mean by “ethics-based,” presumably they have different ethics than you (or LW) does but intent alignment has always been about doing what the user/operator wants, not about solving ethics.
    - Roko 20 Oct 2024 16:33 UTC
      2 points
      3
      Parent
      
      alignment has always been about doing what the user/operator wants
      
      Well it has often been about not doing what the user wants, actually.