p.b.

Karma: 1,212

p.b.Apr 18, 2025, 11:41 AM
2 points
0
in reply to: Steve M’s comment on: On AI personhood
Which is exactly what I am doing in the post? By saying that the question of consciousness is a red herring aka not that relevant to the question of personhood?

p.b.Apr 18, 2025, 11:36 AM
2 points
0
in reply to: Mitchell_Porter’s comment on: On AI personhood
No.
The argument is that feelings or valence more broadly in humans requires additional machinery (amygdala, hypothalamus, etc). If the machinery is missing, the pain/fear/.../valence is missing although the sequence learning works just fine.
AI is missing this machinery, therefore it is extremely unlikely to experience pain/fear/.../valence.

On AI personhood

p.b.Apr 17, 2025, 12:31 PM

4 points

5 comments1 min readLW link

p.b.Apr 17, 2025, 10:50 AM
2 points
0
in reply to: niplav’s comment on: niplav’s Shortform
It’s probably just a difference in tokenizer. Tokenizers often produce tokens with trailing whitespace. I actually once wrote a tokenizer and trained a model to predict “negative whitespace” when a token for once shouldn’t have a trailing whitespace. But I don’t know how current tokenizers handle this, probably in different ways.

p.b.Apr 17, 2025, 7:16 AM
6 points
0
in reply to: p.b.’s comment on: METR: Measuring AI Ability to Complete Long Tasks

p.b.Apr 16, 2025, 6:31 PM
6 points
0
on: p.b.’s Shortform
I originally thought that the METR results meant that this or next year might be the year where AI coding agents had their breakthrough moment. The reasoning behind this was that if the trend holds AI coding agents will be able to do several hour long tasks with a certain probability of success, which would make the overhead and cost of using the agent suddenly very economically viable.
I now realised that this argument has a big hole: All the METR tasks are timed for un-aided humans, i.e. humans without the help of LLMs. This means that especially for those tasks that can be successfully completed by AI coding agents, the actual time a human aided by LLMs would need is much shorter.
I’m not sure how many task completion time doublings this buys before AI coding agents take over a large part of coding, but the farther we extrapolate from the existing data points the higher the uncertainty that the trend will hold.
Estimating task completion times for AI-aided humans would have been an interesting addition to the study. The correlation of the time-savings through AI-support with the task completion probability by AI coding agents might have allowed the prediction of the actual economic competitiveness of AI coding agents in the near future.

p.b.Apr 16, 2025, 11:05 AM
3 points
0
in reply to: Tachikoma’s comment on: Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
I meant chess specific reasoning.

p.b.Apr 15, 2025, 3:33 PM
5 points
0
on: Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
I occasionally test LLMs by giving them a chess diagram and let them answer questions about the position ranging from very simple to requiring some calculation or insight.
Gemini 2.5 Pro also impressed me as the first LLM that could at least perceive the position correctly even if it quickly went off the rails as soon as some reasoning was required.
Contrary to manufacturing I expect this to get a lot better as soon as any of the labs makes an effort.

p.b.Apr 9, 2025, 8:32 AM
2 points
0
in reply to: Tobiasz B’s comment on: AI 2027: What Superintelligence Looks Like
Let’s instead assume a top engineer has a really consequential idea every couple of months. Now what?
Speeding up implementation just means that you test more of the less promising ideas.
Speeding up feedback might mean that you can hone in on the really good ideas faster, but does this actually happen if you don’t do the coding and don’t do the math?

p.b.Apr 5, 2025, 8:27 PM
4 points
0
on: METR: Measuring AI Ability to Complete Long Tasks
Do you plan to evaluate new models in the same way and regularly update the graph?

p.b.Apr 4, 2025, 11:47 AM
2 points
0
in reply to: Jan Betley’s comment on: Do models say what they learn?
Yes, you are right. I overstated my case somewhat for these simple scenarios. There were also earlier results in that direction.
But in your work there probably already is an “unsafe code” activation and the fine-tuning only sets it to a permanent “on”. It already had the ability to state “the unsafe code activation is on” before the fine-tuning, so maybe that result isn’t very surprising?
There probably isn’t an equally simple “discriminate in favour of Canadians” activation, though I could imagine more powerful models to also get that right.
My examples are orders of magnitude harder and I think a fundamental limitation of transformers as they are currently trained.

p.b.Apr 4, 2025, 7:37 AM
28 points
12
on: AI 2027: What Superintelligence Looks Like
I find this possible though it’s not my median scenario to say the least. But I am also not sure I can put the probability of such a fast development below 10%.
Main cruxes:
I am not so sure that “automating AI research” is going to speed up development by orders of magnitude.
My experience is that cracked AI engineers can implement any new paper / well specified research idea in a matter of hours. So speeding up the coding can’t be the huge speedup to R&D.
The bottleneck seems to be:
A.) Coming up with good research ideas.
b.) Finding the precise formulation of that idea that makes most sense/works.
LLMs so far are bad at both. So I currently only see them scouring the immediate neighbourhood of existing ideas, to eke out incremental progress in the current paradigm.
Is that enough? Is an LLM building on a base model that has a loss close to the irreducible loss AGI? I.e. does accelerating this improvement matter for the transition to AGI and superintelligence?
I think not even the authors believe that. So they make the leap of faith that accelerated research will make a qualitative difference too. I think there are additional gaps between human cognition and LLMs beyond recursive reasoning in latent space and sample efficiency.
Will all those gaps be closed in the next few years?

p.b.Apr 1, 2025, 7:44 PM
4 points
2
on: New Cause Area Proposal
This but unironically.

p.b.Mar 25, 2025, 6:50 PM
7 points
0
in reply to: p.b.’s comment on: Do models say what they learn?
To answer my own question: They usually don’t. Models don’t have “conscious access” to the skills and knowledge implicit in their sequence prediction abilities.
If you train a model on text and on videos, they lack all ability to talk sensibly about videos. To gain that ability they need to also train on data that bridges these modalities.
If things were otherwise we would be a lot closer to AGI. Gemini would have been a step change. We would be able to gain significant insights in all kinds of data by training an LLM on it.
Therefore it is not surprising that models don’t say what they learn. They don’t know what they learn.

p.b.Mar 24, 2025, 8:48 PM
14 points
10
on: Recent AI model progress feels mostly like bullshit
I was pretty impressed with o1-preview’s ability to do mathematical derivations. That was definitely a step change, the reasoning models can do things earlier models just couldn’t do. I don’t think the AI labs are cheating for any reasonable definition of cheating.

p.b.Mar 24, 2025, 12:57 PM
7 points
0
on: Do models say what they learn?
Do models know what they learn?

p.b.Mar 14, 2025, 11:24 AM
5 points
0
on: Reducing LLM deception at scale with self-other overlap fine-tuning
A few years ago I had a similar idea, which I called Rawlsian Reinforcement Learning: The idea was to provide scenarios similar to those in this post and evaluate the actions of the model as to which person benefits how much from them. Then reinforce based on mean benefit of all characters in the scenario, or a variation thereof, i.e. the reinforcement signal does not use the information which character in the scenario is the model.
Maybe I misunderstand your method but it seems to me that you untrain the self-other distinction which in the end is a capability. So the model might not become more moral, instead it just loses the capacity to benefit itself because it cannot distinguish between itself and others.

p.b.Mar 6, 2025, 8:15 PM
2 points
0
in reply to: Seth Herd’s comment on: A Bear Case: My Predictions Regarding AI Progress
I kinda agree with this as well. Except that it seems completely unclear to me whether recreating the missing human capabilities/brain systems takes two years or two decades or even longer.
It doesn’t seem to me to be a single missing thing and for each separate step holds: That it hasn’t been done yet is evidence that it’s not that easy.

p.b.Mar 6, 2025, 7:31 PM
5 points
2
in reply to: Steven Byrnes’s comment on: A Bear Case: My Predictions Regarding AI Progress
I think that is exactly right.
I also wouldn’t be too surprised if in some domains RL leads to useful agents if all the individual actions are known to and doable by the model and RL teaches it how to sensibly string these actions together. This doesn’t seem too different from mathematical derivations.

p.b.Mar 6, 2025, 7:19 PM
6 points
0
in reply to: Thomas Kwa’s comment on: A Bear Case: My Predictions Regarding AI Progress
If you think generalization is limited in the current regime, try to create AGI benchmarks that the AIs won’t saturate until we reach some crucial innovation. People keep trying this and they keep saturating every year.
Because these benchmarks are all in the LLM paradigm: Single input, single output from a single distribution. Or they are multi-step problems on rails. Easy verification makes for benchmarks that can quickly be cracked by LLMs. Hard verification makes for benchmarks that aren’t used.
One could let models play new board/computer games against average humans: Video/image input, action output.
One could let models offer and complete tasks autonomously on freelancer platforms.
One could enrol models in remote universities and see whether they autonomously reach graduation.
It’s not difficult to come up with hard benchmarks for current models (these are not close to AGI complete). I think people don’t do this because they know that current models would be hopeless at benchmarks that actually aim for their shortcomings (agency, knowledge integration + integration of sensory information, continuous learning, reliability, …)