swe, speculative investor
O O
In a long AGI world, isn’t it very plausible that it gets developed in China and thus basically all efforts to shape its creation is pointless since Lesswrong and associated efforts don’t have much influence in China?
I’ve always wondered, why didn’t superpowers apply MAIM to nuclear capabilities in the past?
> Speculative but increasingly plausible, confidentiality-preserving AI verifiers
Such as?
I mean, I don’t want to give Big Labs any ideas, but I suspect the reasoning above implies that the o1/deepseek -style RL procedures might work a lot better if they can think internally for a long time
I expect gpt 5 to implement this. Based on recent research and how they phrase it.
I thought OpenAI’s deep research uses the full o3?
Why are current LLMs, reasoning models and whatever else still horribly unreliable? I can ask the current best models (o3, Claude, deep research, etc) to do a task to generate lots of code for me using a specific pattern or make a chart with company valuations and it’ll get them mostly wrong.
Is this just a result of labs hill climbing a bunch of impressive sounding benchmarks? I think this should delay timelines a bit. Unless there’s progress on reliability I just can’t perceive.
SWEs won’t necessarily be fired even after becoming useless
I’m actually surprised at how eager/willing big tech is to fire SWEs once they’re sure they won’t be economically valuable. I think a lot of priors for them being stable come from the ZIRP era. Now, these companies have quite frequent layoffs, silent layoffs, and performance firings. Companies becoming leaner will be a good litmus test for a lot of these claims.
https://x.com/arankomatsuzaki/status/1889522974467957033?s=46&t=9y15MIfip4QAOskUiIhvgA
O3 gets IOI Gold. Either we are in a fast takeoff or the “gold” standard benchmarks are a lot less useful than imagined.
I feel like a lot of manifold is virtue signaling .
Just curious. How do you square the rise in AI stocks taking so long? Many people here thought it was obvious since 2022 and made a ton of money.
Keep in mind the current administration is replacing incompetent bureaucracies with self assembling corporations. The organization is still there, just more competent and under a different name. A government project could just look like telling the labs to create 1 data center, throwing money at them, and cutting red tape for building gas plants.
Seems increasingly likely to me that there will be some kind of national AI project before AGI is achieved as the government is waking up to its potential pretty fast. Unsure what the odds are, but last year, I would have said <10%. Now I think it’s between 30% and 60%.
Has anyone done a write up on what the government-led AI project(s) scenario would look like?
It might just be a perception problem. LLMs don’t really seem to have a good understanding of a letter being next to another one yet or what a diagonal is. If you look at arc-agi with o3, you see it doing worse as the grid gets larger with humans not having the same drawback.
EDIT: Tried on o1 pro right now. Doesn’t seem like a perception problem, but it still could be. I wonder if it’s related to being a succcesful agent. It might not model a sequence of actions on the state of a world properly yet. It’s strange that this isn’t unlocked with reasoning.
Ah so there could actually be a large compute overhang as it stands?
Does deepseek v3 imply current models are not trained as efficiently as they could be? Seems like they used a very small fraction of previous models resources and is only slightly worse than the best LLM.
They did this on the far easier training set though?
An alternative story is they trained until a model was found that could beat the training set but many other benchmarks too, implying that there may be some general intelligence factor there. Maybe this is still goodharting on benchmarks but there’s probably truly something there.
No, I believe there is a human in the loop for the above if that’s not clear.
You’ve said it in another comment. But this is probably an “architecture search”.
I guess the training loop for o3 is similar but it would be on the easier training set instead of the far harder test set.
I think there is a third explanation here. The Kaggle model (probably) does well because you can brute force it with a bag of heuristics and gradually iterate by discarding ones that don’t work and keeping the ones that do.
I have ~15% probability humanity will invent artificial superintelligence (ASI) by 2030.
The recent announcement of the o3 model has updated me to 95%, with most of the 5% being regulatory slow downs involving unprecedented global cooperation.
Do you have a link to these?
Isn’t it a distribution problem? World hunger has almost disappeared however. (The issue is hungrier nations have more kids, so progress is a bit hidden).