Thanks for this comment, I take it very serious that things can inspire people and burn timeline.
I think this is a good counterargument though: There is also something counterintuitive to this dynamic: as models become stronger, the barriers to entry will actually go down; i.e. you will be able to prompt the AI to build its own advanced scaffolding. Similarly, the user could just point the model at a paper on refusal-vector ablation or some other future technique and ask the model to essentially remove its own safety.
I don’t want to give people ideas or appear cynical here, sorry if that is the impression.
No particular disagreement that your marginal contribution is low and that this has the potential to be useful for durable alignment. Like I said, I’m thinking in terms of not burning days with what one doesn’t say.
Thanks for this comment, I take it very serious that things can inspire people and burn timeline.
I think this is a good counterargument though:
There is also something counterintuitive to this dynamic: as models become stronger, the barriers to entry will actually go down; i.e. you will be able to prompt the AI to build its own advanced scaffolding. Similarly, the user could just point the model at a paper on refusal-vector ablation or some other future technique and ask the model to essentially remove its own safety.
I don’t want to give people ideas or appear cynical here, sorry if that is the impression.
No particular disagreement that your marginal contribution is low and that this has the potential to be useful for durable alignment. Like I said, I’m thinking in terms of not burning days with what one doesn’t say.