My view has shifted a lot since the GPT-2 days. Back then, I thought that frontier models presented the largest danger to humanity. Since the increasing alignment of profit-motive and safety of the large labs, and their resulting improvements to security of their model weights, model refinement (e.g. RLHF), and general model control (e.g. API call filtering) , I’ve felt less and less worried.
I now feel more confident that the world will get to AGI before too long with incremental progress, and that I would MUCH rather have it be one of the big labs that gets there than the Open Source community.
Open model weights are dangerous, and nothing I’ve yet seen has suggested hope that we might find a way to change that. I’ve seen a good deal of evidence with my own eyes of dangerous outputs from fine-tuned open weight models.
Are they as scary as a uncensored frontier model would theoretically be? No, of course not.
Are they more dangerous than a carefully controlled frontier model? Yes, I think so.
My view has shifted a lot since the GPT-2 days. Back then, I thought that frontier models presented the largest danger to humanity. Since the increasing alignment of profit-motive and safety of the large labs, and their resulting improvements to security of their model weights, model refinement (e.g. RLHF), and general model control (e.g. API call filtering) , I’ve felt less and less worried.
I now feel more confident that the world will get to AGI before too long with incremental progress, and that I would MUCH rather have it be one of the big labs that gets there than the Open Source community.
Open model weights are dangerous, and nothing I’ve yet seen has suggested hope that we might find a way to change that. I’ve seen a good deal of evidence with my own eyes of dangerous outputs from fine-tuned open weight models.
Are they as scary as a uncensored frontier model would theoretically be? No, of course not.
Are they more dangerous than a carefully controlled frontier model? Yes, I think so.