quiet_NaN comments on Why I’m doing PauseAI

quiet_NaN 4 May 2024 17:29 UTC
5 points
3
Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.
I am by no means an expert on machine learning, but this sentence reads weird to me.
I mean, it seems possible that a part of a NN develops some self-reinforcing feature which uses the gradient descent (or whatever is used in training) to go into a particular direction and take over the NN, like a human adrift on a raft in the ocean might decide to build a sail to make the raft go into a particular direction.
Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it’s own weights?
Personally, I think that if GPT-5 is the point of no return, it is more likely that it is because it would be smart enough to actually help advance AI after it is trained. While improving semiconductors seems hard and would require a lot of work in the real world done with human cooperation, finding better NN architectures and training algorithms seems like something well in the realm of the possible, if not exactly plausible.
So if I had to guess how GPT-5 might doom humanity, I would say that in a few million instance-hours it figures out how to train LLMs of its own power for 1/100th of the cost, and this information becomes public.
The budgets of institutions which might train NN probably follows some power law, so if training cutting edge LLMs becomes a hundred times cheaper, the number of institutions which could build cutting edge LLMs becomes many orders of magnitude higher—unless the big players go full steam ahead towards a paperclip maximizer, of course. This likely mean that voluntary coordination (if that was ever on the table) becomes impossible. And setting up a worldwide authoritarian system to impose limits would also be both distasteful and difficult.
- Joseph Miller 4 May 2024 18:51 UTC
  1 point
  0
  Parent
  
  Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it’s own weights?
  
  I was thinking of a scenario where OpenAI deliberately gives it access to its own weights to see if it can self improve.
  
  I agree that it would be more likely to just speed up normal ML research.