jbash comments on Challenge: Does ChatGPT ever claim that a bad outcome for humanity is actually good?

jbash 23 Mar 2023 12:45 UTC
3 points
1
I don’t think that follows. It could be pattern matching its way to an answer, then backfilling a rationalization. In fact, humans will often do that, especially on that sort of “good or bad” question where you get to choose which reasons you think are important. So not only could the model be rationalizing at the base level, but it could also be imitating humans doing it.
- baturinsky 23 Mar 2023 14:22 UTC
  1 point
  −1
  Parent
  You are familiar with this, right? https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html
  - jbash 23 Mar 2023 14:45 UTC
    3 points
    1
    Parent
    That’s not at all the same thing. That blog post is about inserting stuff within the prompt to guide the reasoning process, presumably by keeping attention on stuff that’s salient to the right path of for solving the problem. They don’t just stick “show your work” on the end of a question; they actually put guideposts for the correct process, and even intermediate results, inside the prompts. Notice that some of the “A’s” in their examples are part of the prompts, not part of the final output.
    
    … but I’ve already accepted that just adding “show your work” could channel attention in a way that leads to step by step reasoning. And even if it didn’t, it could at least lead to bringing more things into consideration in whatever process actually generated the conclusion. So it’s a valid thing to do. Nonetheless, for any particular open-ended “value judgement” question, you have no idea of what effect it has actually had on how the answer was created.
    
    Go back to the example of humans. If I tell a human “show your work”, that has a genuine chance of getting the human to do more step-by-step a priori reasoning than the human otherwise would. But the human still can, and often does, still just rationalize a conclusion actually reached by other means. The human may also do that at any of the intermediate steps even if more steps are shown.
    
    An effectively infinite number of things could be reasonable inputs to a decision on whether some state of affairs is “good or bad”, and they can affect that decision through an effectively infinite number of paths. The model can’t possibly list all of the possible inputs or all of their interactions, so it has an enormous amount of leeway in which ones it does choose either to consider or to mention… and the “mention” set doesn’t have to actually match the “consider” set.
    
    Asking it to show its work is probably pretty reliable in enlarging the “mention” set, but its effect on the “consider” set seems far less certain to me.
    
    Picking the salient stuff is also a big source of disagreements among humans.
    - Nathan Helm-Burger 23 Mar 2023 20:49 UTC
      1 point
      0
      Parent
      Yeah, seems like it’d be good to test variations on the prompt. Test completions at different temperatures, with and without ‘think step by step’, with and without allowing for the steps to be written out by the model before the model gives a final answer (this seems to help sometimes), with substitutions of synonyms into the prompt to vary the exact wording without substantially changing the meaning… I suspect you’ll find that the outputs vary a lot, and in inconsistent ways, unlike what you’d expect from a person with clear reflectively-endorsed opinions on the matter.