p.b.

Karma: 1,232

p.b.Mar 6, 2025, 8:15 PM
3 points
0
in reply to: Seth Herd’s comment on: A Bear Case: My Predictions Regarding AI Progress
I kinda agree with this as well. Except that it seems completely unclear to me whether recreating the missing human capabilities/brain systems takes two years or two decades or even longer.
It doesn’t seem to me to be a single missing thing and for each separate step holds: That it hasn’t been done yet is evidence that it’s not that easy.

p.b.Mar 6, 2025, 7:31 PM
5 points
2
in reply to: Steven Byrnes’s comment on: A Bear Case: My Predictions Regarding AI Progress
I think that is exactly right.
I also wouldn’t be too surprised if in some domains RL leads to useful agents if all the individual actions are known to and doable by the model and RL teaches it how to sensibly string these actions together. This doesn’t seem too different from mathematical derivations.

p.b.Mar 6, 2025, 7:19 PM
7 points
0
in reply to: Thomas Kwa’s comment on: A Bear Case: My Predictions Regarding AI Progress
If you think generalization is limited in the current regime, try to create AGI benchmarks that the AIs won’t saturate until we reach some crucial innovation. People keep trying this and they keep saturating every year.
Because these benchmarks are all in the LLM paradigm: Single input, single output from a single distribution. Or they are multi-step problems on rails. Easy verification makes for benchmarks that can quickly be cracked by LLMs. Hard verification makes for benchmarks that aren’t used.
One could let models play new board/computer games against average humans: Video/image input, action output.
One could let models offer and complete tasks autonomously on freelancer platforms.
One could enrol models in remote universities and see whether they autonomously reach graduation.
It’s not difficult to come up with hard benchmarks for current models (these are not close to AGI complete). I think people don’t do this because they know that current models would be hopeless at benchmarks that actually aim for their shortcomings (agency, knowledge integration + integration of sensory information, continuous learning, reliability, …)

p.b.Mar 5, 2025, 8:30 PM
7 points
2
in reply to: johnswentworth’s comment on: A Bear Case: My Predictions Regarding AI Progress
Same here.

p.b.Feb 23, 2025, 8:32 PM
10 points
1
in reply to: lsusr’s comment on: The case for the death penalty
If you only execute repeat offenders the fraction of “completely” innocent people executed goes way down.
The idea of being in the wrong place at the wrong time and then being executed gives me pause.
The idea of being framed for shop lifting, framed for shop lifting again, wrongfully convicted of a violent crime and then being at the wrong place at the wrong time is ridiculous.

p.b.Feb 21, 2025, 7:00 AM
2 points
0
in reply to: GeneSmith’s comment on: How to Make Superbabies
Do you have a reference for the personality trait gene-gene interaction thing? Or maybe an explanation how that was determined?

p.b.Feb 15, 2025, 10:25 PM
3 points
0
in reply to: tangerine’s comment on: ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
I think this inability of “learning while thinking” might be the key missing thing of LLMs and I am not sure “thought assessment” or “sequential reasoning” are not red herrings compared to this. What good is assessment of thoughts if you are fundamentally limited in changing them? Also, reasoning models seem to do sequential reasoning just fine as long as they already have learned all the necessary concepts.

p.b.Feb 13, 2025, 9:10 PM
2 points
0
in reply to: Cole Wyeth’s comment on: My model of what is going on with LLMs
But the historical difficulty of RL is based on models starting from scratch. Unclear whether moulding a model that already knows how to do all the steps into doing all the steps is anywhere as difficult as using RL to also learn how to do all the steps.

p.b.Feb 13, 2025, 9:07 PM
2 points
0
in reply to: samusasuke’s comment on: Why you maybe should lift weights, and How to.
10% seems like a lot.
Also, I worry a bit about being too variable in the number of reps and in how to add weight. I found I fall easily into doing the minimal version—“just getting it done for today”. Then improvement stalls and motivation drops.
I think part of the appeal of “Starting Strength” (which I started recently) is that it’s very strict. Unfortunately if adding 15 kilo a week for three weeks to squats it not going to kill me drinking a gallon of milk a day will.
Which is to say, I appreciate your post for giving more building pieces for a workout that works out for me.

p.b.Feb 13, 2025, 8:48 PM
2 points
0
in reply to: Cole Wyeth’s comment on: My model of what is going on with LLMs
I think AlexNet wasn’t even the first to win computer vision competitions based on GPU-acceleration but that was definitely the step that jump-started Deep Learning around 2011/2012.
To me it rather seems like agency and intelligence is not very intertwined. Intelligence is the ability to create precise models—this does not imply that you use these models well or in a goal-directed fashion at all.
That we have now started the path down RLing the models to make them pursue the goal of solving math and coding problems in a more directed and effective manner implies to me that we should see inroads to other areas of agentic behavior as well.
Whether that will be slow going or done next year cannot really be decided based on the long history of slowly increasing the intelligence of models because it is not about increasing the intelligence of models.

p.b.Feb 13, 2025, 7:41 PM
4 points
2
on: My model of what is going on with LLMs
Apparently^[1] enthusiasm didn’t really ramp up again until 2012, when AlexNet proved shockingly effective at image classification.
I think after the backpropagation paper was published in the eighties enthusiasm did ramp up a lot. Which lead to a lot of important work in the nineties like (mature) CNNs, LSTMs, etc.

p.b.Feb 13, 2025, 5:39 PM
2 points
0
on: Why you maybe should lift weights, and How to.
Could you say a bit about progression?

p.b.Feb 10, 2025, 9:13 PM
4 points
2
on: Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
ELO is the Electric Light Orchestra. The Elo rating is named after Prof. Arpad Elo.
I considered the idea of representing players via vectors in different context (chess, soccer, mma) and also worked a bit on splitting the evaluation of moves into “quality” and “risk taking”, with the idea of quantifying aggression in chess.
My impression is that the single scalar rating works really well in chess, so I’m not sure how much there is beyond that. However, some simple experiments in that direction wouldn’t be too difficult to set up.
Also, I think there were competitions on creating better rating systems that outperform Elo’s predictiveness (which apparently isn’t too difficult). But I don’t know whether any of those were multi-dimensional.

p.b.Feb 4, 2025, 11:06 AM
19 points
8
on: p.b.’s Shortform
My bear case for Nvidia goes like this:
I see three non-exclusive scenarios where Nvidia stops playing the important role in AI training and inference that it used to play in the past 10 years:
1. China invades or blockades Taiwan. Metaculus gives around 25% for an invasion in the next 5 years.
2. All major players switch to their own chips. Like Google has already done, Amazon is in the process of doing, Microsoft and Meta have started doing and even OpenAI seems to be planning.
3. Nvidias moats fail. CUDA is replicated for cheaper hardware, ASICs or stuff like Cerebras start dominating inference, etc.
All these become much more likely than the current baseline (whatever that is) in the case of AI scaling quickly and generating significant value.

p.b.Feb 4, 2025, 10:29 AM
2 points
0
on: DeepSeek: Don’t Panic
A very detailed and technical analysis of the bear case for Nvidia by Jeffrey Emanuel, that Matt Levine claims may have been responsible for the Nvidia price decline.
I read that last week. It was an interesting case of experiencing Gell-Mann-Amnesia several times within the same article.
All the parts where I have some expertise were vague, used terminology incorrectly and were often just wrong. All the rest was very interesting!
If this article crashed the market: EMH RIP.

p.b.Feb 4, 2025, 9:54 AM
2 points
0
on: DeepSeek: Don’t Panic
I would hesitate to buy a build based on R1. R1 is special in the sense that the MoE-architecture trades off compute requirements vs RAM requirements. Which is why now these CPU-builds start to make some sense—you get a lot less compute, but much more RAM.
As soon as the next dense model drops which could have 5-times fewer parameters for the same performance the build will stop making any sense. And of course until then you are also handicapped when it comes to running smaller models fast.
The sweet spot is integrated RAM/VRAM like in a Mac and in the upcoming NVIDIA DIGITS. But buying a handful of used 3090s probably also makes more sense to me then the CPU-only builds.

p.b.Jan 26, 2025, 5:39 PM
4 points
0
in reply to: gwern’s comment on: Implications of the inference scaling paradigm for AI safety
So how could I have thought that faster might actually be a sensible training trick for reasoning models.

p.b.Dec 11, 2024, 8:17 AM
6 points
4
in reply to: Jesse Hoogland’s comment on: Jesse Hoogland’s Shortform
You are skipping over a very important component: Evaluation.
Which is exactly what we don’t know how to do well enough outside of formally verifiable domains like math and code, which is exactly where o1 shows big performance jumps.

p.b.Nov 18, 2024, 4:29 PM
2 points
0
in reply to: ZY’s comment on: Rauno’s Shortform
There was one comment on twitter that the RLHF-finetuned models also still have the ability to play chess pretty well, just their input/output-formatting made it impossible for them to access this ability (or something along these lines). But apparently it can be recovered with a little finetuning.

p.b.Nov 18, 2024, 4:18 PM
4 points
0
in reply to: Leon Lang’s comment on: Leon Lang’s Shortform
The paper seems to be about scaling laws for a static dataset as well?
Similar to the initial study of scale in LLMs, we focus on the effect of scaling on a generative pre-training loss (rather than on downstream agent performance, or reward- or representation-centric objectives), in the infinite data regime, on a fixed offline dataset.
To learn to act you’d need to do reinforcement learning, which is massively less data-efficient than the current self-supervised training.
More generally: I think almost everyone thinks that you’d need to scale the right thing for further progress. The question is just what the right thing is if text is not the right thing. Because text encodes highly powerful abstractions (produced by humans and human culture over many centuries) in a very information dense way.