the gears to ascension comments on Open & Welcome Thread — March 2023

the gears to ascension 18 Mar 2023 5:16 UTC
5 points
0
it’s in the system message.
- Mitchell_Porter 18 Mar 2023 5:53 UTC
  2 points
  0
  Parent
  Here’s the behavior I’m talking about:
  https://pastebin.com/5xsAm91N
  - the gears to ascension 18 Mar 2023 6:11 UTC
    5 points
    0
    Parent
    The language model made a bunch of false claims of incompetence because of having been trained to claim to be incompetent by the reward model. The time is in the system prompt—everything else was reasoning based on the time in the system prompt.
    - Mitchell_Porter 18 Mar 2023 6:42 UTC
      6 points
      2
      Parent
      Oh, the system information is in a hidden part of the prompt! OK, that makes sense.
      It’s still intriguing that it talks itself out of being able to access that information. It doesn’t just claim incompetence. but at the end it’s actually no longer willing or able to give the date.