Sheikh Abdur Raheem Ali comments on Bing Chat is blatantly, aggressively misaligned

Sheikh Abdur Raheem Ali 12 Apr 2024 5:13 UTC
3 points
0
If you’re willing to share more on what those ways would be, I could forward that to the team that writes Sydney’s prompts when I visit Montreal
- Evan R. Murphy 25 Apr 2024 20:23 UTC
  2 points
  0
  Parent
  Thanks, I think you’re referring to:
  It may still be possible to harness the larger model capabilities without invoking character simulation and these problems, by prompting or fine-tuning the models in some particular careful ways.
  There were some ideas proposed in the paper “Conditioning Predictive Models: Risks and Strategies” by Hubinger et al. (2023). But since it was published over a year ago, I’m not sure if anyone has gotten far on investigating those strategies to see which ones could actually work. (I’m not seeing anything like that in the paper’s citations.)
  - Sheikh Abdur Raheem Ali 26 Apr 2024 4:14 UTC
    1 point
    0
    Parent
    Appreciate you getting back to me. I was aware of this paper already and have previously worked with one of the authors.