cubefox comments on My AI Model Delta Compared To Yudkowsky

cubefox 14 Jun 2024 19:32 UTC
−5 points
−13
At least RLHF is observably generalizing in “catastrophic ways”:
You may argue that this will change in the future, but that isn’t supported by an inductive argument (ChatGPT-3.5 had the same problem).
- Nora Belrose 14 Jun 2024 22:36 UTC
  3 points
  8
  Parent
  It’s not clear that this is undesired behavior from the perspective of OpenAI. They aren’t actually putting GPT in a situation where it will make high-stakes decisions, and upholding deontological principles seems better from a PR perspective than consequentialist reasoning in these cases.
  - cubefox 14 Jun 2024 23:13 UTC
    2 points
    −3
    Parent
    If it is merely “not clear” then this doesn’t seem to be enough for an optimistic inductive inference. I also disagree that this looks good from a PR perspective. It looks even worse than Kant’s infamous example where you allegedly aren’t allowed to lie when hiding someone from a murderer.