If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks, then you ought to be able to say much weaker things that are impossible in two years, and you should have those predictions queued up and ready to go rather than falling into nervous silence after being asked.
Sorry, I might misunderstanding you (and hope I am), but… I think doomers literally say “Nobody knows what internal motivational structures SGD will entrain into scaled-up networks and thus we are all doomed”. The problems is not having the science to confidently say how the AIs will turn out, and not that doomers have a secret method to know that next-token-prediction is evil.
If you meant that doomers are too confident answering the question “will SGD even make motivational structures?” their (and mine) answer still stems from ignorance: nobody knows, but it is plausible that SGD will make motivational structures in the neural networks because it can be useful in many tasks (to get low loss or whatever), and if you think you do know better you should show it experimentally and theoretically in excruciating detail.
I also don’t see how it logically follows that “If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks” ⇒ “then you ought to be able to say much weaker things that are impossible in two years” but it seems to be the core of the post. Even if anyone had the extraordinary model to predict what SGD exactly does (which we, as a species, should really strive for!!) it would still be a different question to predict what will or won’t happen in the next two years.
If I reason about my field (physics) the same should hold for a sentence structured like “If your model has the extraordinary power to say how an array of neutral atoms cooled to a few nK will behave when a laser is shone upon them” (which is true) ⇒ “then you ought to be able to say much weaker things that are impossible in two years in the field of cold atom physics” (which is… not true). It’s a non sequitur.
If you meant that doomers are too confident answering the question “will SGD even make motivational structures?” their (and mine) answer still stems from ignorance: nobody knows, but it is plausible that SGD will make motivational structures in the neural networks because it can be useful in many tasks (to get low loss or whatever), and if you think you do know better you should show it experimentally and theoretically in excruciating detail.
I also don’t see how it logically follows that “If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks” ⇒ “then you ought to be able to say much weaker things that are impossible in two years” but it seems to be the core of the post.
The relevant commonality is “ability to predict the future alignment properties and internal mechanisms of neural networks.” (Also, I don’t exactly endorse everything in this fake quotation, so indeed the analogized tasks aren’t as close as I’d like. I had to trade off between “what I actually believe” and “making minimal edits to the source material.”)
Sorry, I might misunderstanding you (and hope I am), but… I think doomers literally say “Nobody knows what internal motivational structures SGD will entrain into scaled-up networks and thus we are all doomed”. The problems is not having the science to confidently say how the AIs will turn out, and not that doomers have a secret method to know that next-token-prediction is evil.
If you meant that doomers are too confident answering the question “will SGD even make motivational structures?” their (and mine) answer still stems from ignorance: nobody knows, but it is plausible that SGD will make motivational structures in the neural networks because it can be useful in many tasks (to get low loss or whatever), and if you think you do know better you should show it experimentally and theoretically in excruciating detail.
I also don’t see how it logically follows that “If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks” ⇒ “then you ought to be able to say much weaker things that are impossible in two years” but it seems to be the core of the post. Even if anyone had the extraordinary model to predict what SGD exactly does (which we, as a species, should really strive for!!) it would still be a different question to predict what will or won’t happen in the next two years.
If I reason about my field (physics) the same should hold for a sentence structured like “If your model has the extraordinary power to say how an array of neutral atoms cooled to a few nK will behave when a laser is shone upon them” (which is true) ⇒ “then you ought to be able to say much weaker things that are impossible in two years in the field of cold atom physics” (which is… not true). It’s a non sequitur.
It would be “useful” (i.e. fitness-increasing) for wolves to have evolved biological sniper rifles, but they did not. By what evidence are we locating these motivational hypotheses, and what kinds of structures are dangerous, and why are they plausible under the NN prior?
The relevant commonality is “ability to predict the future alignment properties and internal mechanisms of neural networks.” (Also, I don’t exactly endorse everything in this fake quotation, so indeed the analogized tasks aren’t as close as I’d like. I had to trade off between “what I actually believe” and “making minimal edits to the source material.”)