Your argument is actually possible, but what evidences do you have, that make it the likely outcome?
Hopenope
the difficulty of alignment is still unknown. it may be totally impossible, or maybe some changes to current methods (deliberative alignment or constitutional ai) + some R&D automation can get us there.
The recurrent paper is actually scary, but some of the stuff there are actually questionable. is 8 layers enough for a 3.5b model? qwen 0.5b has 24 layers.there is also almost no difference between 180b vs 800b model, when r=1(table 4). is this just a case of overcoming insufficient number of layers here?
Would you update your timelines, if he is telling the truth ?
Is COT faithfulness already obsolete? How does it survive the concepts like latent space reasoning, or RL based manipulations(R1-zero)? Is it realistic to think that these highly competitive companies simply will not use them, and simply ignore the compute efficiency?
I am not sure if longer timelines are always safer. For example, when comparing a two-year timeline to a five-year one, there are a lot of advantages to the shorter timeline. In both cases you need to outsource a lot of alignment research to AI anyway, and the amount of compute and the number of players with significant compute are lower, which reduces both the racing pressure and takeoff speed.
What happened to Waluigi effect? It used to be a big issue, some people were against it, and suddenly it is pretty much forgotten. Are there any related research, or recent demos, that examine it in more detail?
If you have a very short timeline, and you don’t think that alignment is solvable in such a short time, then what can you still do to reduce the chance of x-risk?
Many expert level benchmarks totally overestimate the range and diversity of their experts’ knowledge. A person with a PhD in physics is probably undergraduate level in many parts of physics that are not related to his/her research area, and sometimes we even see that within expert’s domain (Neurologists usually forget about nerves that are not clinically relevant).
Overrefusal issues were way more common 1-2 years ago. models like gemini 1, and claude 1-2 had severe overrefusal issues.