When making safety cases for alignment, its important to remember that defense against single-turn attacks doesn’t always imply defense against multi-turn attacks.
Our recent paper shows a case where breaking up a single turn attack into multiple prompts (spreading it out over the conversation) changes which models/guardrails are vulnerable to the jailbreak.
Robustness against the single-turn version didn’t imply robustness against the multi-turn version of the attack, and robustness against the multi-turn version didn’t imply robustness against the single-turn version of the attack.
Should it be more tabooed to put the bottom line in the title?
Titles like “in defense of <bottom line>” or just “<bottom line>” seem to:
Unnecessarily make it really easy for people to select content to read based on the conclusion it comes to
Frame the post as having the goal of convincing you of <bottom line>, and setting up the readers expectation as such. This seems like it would either put you in pause critical thinking to defend My Team mode (if you agree with the title), or continuously search for counter-arguments mode (if you disagree with the title).
I think putting the conclusion in the title is good insofar it’s a form of anti-clickbait: It’s the most informative title possible. Yes, people may be motivated to read it in order to confirm their pre-existing opinion, or to search for counterarguments, but the alternative is often that they don’t read the article at all, for a lack of motivation.
People who are motivated to write a comment from a disagreement with the title are, more or less, forced to read the actual post in order to compose their rebuttal. Which is better than not receiving any engagement from this person at all. And perhaps this post even changes their mind, or they agree with the title but find the arguments in the post too weak.
Overall, having the conclusion in the title seems good for similar reasons a summary in the beginning is good.
Though a reason to avoid the bottom line in the title is if it is some generally unpopular opinion. Many people will reflexively downvote the post without reading, causing it to be seen by fewer readers.
The scene in planecrash where Keltham gives his first lecture, as an attempt to teach some formal logic (and a whole bunch of important concepts that usually don’t get properly taught in school), is something I’d highly recommend reading! As far as I can remember, you should be able to just pick it up right here, and follow the important parts of the lecture without understanding the story
When making safety cases for alignment, its important to remember that defense against single-turn attacks doesn’t always imply defense against multi-turn attacks.
Our recent paper shows a case where breaking up a single turn attack into multiple prompts (spreading it out over the conversation) changes which models/guardrails are vulnerable to the jailbreak.
Robustness against the single-turn version didn’t imply robustness against the multi-turn version of the attack, and robustness against the multi-turn version didn’t imply robustness against the single-turn version of the attack.
Should it be more tabooed to put the bottom line in the title?
Titles like “in defense of <bottom line>” or just “<bottom line>” seem to:
Unnecessarily make it really easy for people to select content to read based on the conclusion it comes to
Frame the post as having the goal of convincing you of <bottom line>, and setting up the readers expectation as such. This seems like it would either put you in pause critical thinking to defend My Team mode (if you agree with the title), or continuously search for counter-arguments mode (if you disagree with the title).
I think putting the conclusion in the title is good insofar it’s a form of anti-clickbait: It’s the most informative title possible. Yes, people may be motivated to read it in order to confirm their pre-existing opinion, or to search for counterarguments, but the alternative is often that they don’t read the article at all, for a lack of motivation.
People who are motivated to write a comment from a disagreement with the title are, more or less, forced to read the actual post in order to compose their rebuttal. Which is better than not receiving any engagement from this person at all. And perhaps this post even changes their mind, or they agree with the title but find the arguments in the post too weak.
Overall, having the conclusion in the title seems good for similar reasons a summary in the beginning is good.
Though a reason to avoid the bottom line in the title is if it is some generally unpopular opinion. Many people will reflexively downvote the post without reading, causing it to be seen by fewer readers.
The scene in planecrash where Keltham gives his first lecture, as an attempt to teach some formal logic (and a whole bunch of important concepts that usually don’t get properly taught in school), is something I’d highly recommend reading! As far as I can remember, you should be able to just pick it up right here, and follow the important parts of the lecture without understanding the story