TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 1 Jan 2024 19:47 UTC
LW: 2 AF: 2
5
AF
If you meant that doomers are too confident answering the question “will SGD even make motivational structures?” their (and mine) answer still stems from ignorance: nobody knows, but it is plausible that SGD will make motivational structures in the neural networks because it can be useful in many tasks (to get low loss or whatever), and if you think you do know better you should show it experimentally and theoretically in excruciating detail.
It would be “useful” (i.e. fitness-increasing) for wolves to have evolved biological sniper rifles, but they did not. By what evidence are we locating these motivational hypotheses, and what kinds of structures are dangerous, and why are they plausible under the NN prior?
I also don’t see how it logically follows that “If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks” ⇒ “then you ought to be able to say much weaker things that are impossible in two years” but it seems to be the core of the post.
The relevant commonality is “ability to predict the future alignment properties and internal mechanisms of neural networks.” (Also, I don’t exactly endorse everything in this fake quotation, so indeed the analogized tasks aren’t as close as I’d like. I had to trade off between “what I actually believe” and “making minimal edits to the source material.”)