My strong upvotes are now giving +1 and my regular upvotes give +2.
lc
test
Just edited the post because I think the way it was phrased kind of exaggerated the difficulties we’ve been having applying the newer models. 3.7 was better, as I mentioned to Daniel, just underwhelming and not as big a leap as either 3.6 or certainly 3.5.
If you plot a line, does it plateau or does it get to professional human level (i.e. reliably doing all the things you are trying to get it to do as well as a professional human would)?
It plateaus before professional human level, both in a macro sense (comparing what ZeroPath can do vs. human pentesters) and in a micro sense (comparing the individual tasks ZeroPath does when it’s analyzing code). At least, the errors the models make are not ones I would expect a professional to make; I haven’t actually hired a bunch of pentesters and asked them to do the same tasks we expect of the language models and made the diff. One thing our tool has over people is breadth, but that’s because we can parallelize inspection of different pieces and not because the models are doing tasks better than humans.
What about 4.5? Is it as good as 3.7 Sonnet but you don’t use it for cost reasons? Or is it actually worse?
We have not yet tried 4.5 as it’s so expensive that we would not be able to deploy it, even for limited sections.
We use different models for different tasks for cost reasons. The primary workhorse model today is 3.7 sonnet, whose improvement over 3.6 sonnet was smaller than 3.6′s improvement over 3.5 sonnet. When taking the job of this workhorse model, o3-mini and the rest of the recent o-series models were strictly worse than 3.6.
Recent AI model progress feels mostly like bullshit
I haven’t read the METR paper in full, but from the examples given I’m worried the tests might be biased in favor of an agent with no capacity for long term memory, or at least not hitting the thresholds where context limitations become a problem:
For instance, task #3 here is at the limit of current AI capabilities (takes an hour). But it’s also something that could plausibly be done with very little context; if the AI just puts all of the example files in its context window it might be able to write the rest of the decoder from scratch. It might not even need to have the example files in memory while it’s debugging its project against the test cases.
Whereas a task to fix a bug in a large software project, while it might take an engineer associated with that project “an hour” to finish, requires stretching the limits of the amount of information it can fit inside a context window, or recall beyond what we seem to be capable of doing today.
- LLM AGI will have memory, and memory changes alignment by Apr 4, 2025, 2:59 PM; 70 points) (
- Mar 24, 2025, 10:22 PM; 14 points) 's comment on Recent AI model progress feels mostly like bullshit by (
There was a type of guy circa 2021 that basically said that gpt-3 etc. was cool, but we should be cautious about assuming everything was going to change, because the context limitation was a key bottleneck that might never be overcome. That guy’s take was briefly “discredited” in subsequent years when LLM companies increased context lengths to 100k, 200k tokens.
I think that was premature. The context limitations (in particular the lack of an equivalent to human long term memory) are the key deficit of current LLMs and we haven’t really seen much improvement at all.
If AI executives really are as bullish as they say they are on progress, then why are they willing to raise money anywhere in the ballpark of current valuations?
The story is that they need the capital to build the models that they think will do that.
Moral intuitions are odd. The current government’s gutting of the AI safety summit is upsetting, but somehow less upsetting to my hindbrain than its order to drop the corruption charges against a mayor. I guess the AI safety thing is worse in practice but less shocking in terms of abstract conduct violations.
It helps, but this could be solved with increased affection for your children specifically, so I don’t think it’s the actual motivation for the trait.
The core is probably several things, but note that this bias is also part of a larger package of traits that makes someone less disagreeable. I’m guessing that the same selection effects that made men more disagreeable than women are also probably partly responsible for this gender difference.
I suspect that the psychopath’s theory of mind is not “other people are generally nicer than me”, but “other people are generally stupid, or too weak to risk fighting with me”.
That is true, and it is indeed a bias, but it doesn’t change the fact that their assessment of whether others are going to hurt them seems basically well calibrated. The anecdata that needs to be explained is why nice people do not seem to be able to tell when others are going to take advantage of them, but mean people do. The posts’ offered reason is that generous impressions of others are advantageous for trust-building.
Mr. Portman probably believed that some children forgot to pay for the chocolate bars, because he was aware that different people have different memory skills.
This was the explanation he offered, yeah.
This post is about a suspected cognitive bias and why I think it came to be. It’s not trying to justify any behavior, as far as I can tell, unless you think the sentiment “people are pretty awful” justifies bad behavior in of itself.
The game theory is mostly an extended metaphor rather than a serious model. Humans are complicated.
Virtue signaling, and the “humans-are-wonderful” bias, as a trust exercise
Elon already has all of the money in the world. I think he and his employs are ideologically driven, and as far as I can tell they’re making sensible decisions given their stated goals of reducing unnecessary spend/sprawl. I seriously doubt they’re going to use this access to either raid the treasury or turn it into a personal fiefdom. It’s possible that in their haste they’re introducing security risks, but I also think the tendency of media outlets and their sources will be to exaggerate those security risks. I’d be happy to start a prediction market about this if a regular feels very differently.
If Trump himself was spearheading this effort I would be more worried.
Anthropic has a bug bounty for jailbreaks: https://hackerone.com/constitutional-classifiers?type=team
If you can figure out how to get the model to give detailed answers to a set of certain questions, you get a 10k prize. If you can find a universal jailbreak for all the questions, you get 20k.
Yeah, one possible answer is “don’t do anything weird, ever”. That is the safe way, on average. No one will bother writing a story about you, because no one would bother reading it.
You laugh, but I really think a group norm of “think for yourself, question the outside world, don’t be afraid to be weird” is part of the reason why all of these groups exist. Doing those things is ultimately a luxury for the well-adjusted and intelligent. If you tell people over and over to question social norms some of those people will turn out to be crazy and conclude crime and violence is acceptable.
I don’t know if there’s anything to do about that, but it is a thing.
So, to be clear, everyone you can think of has been mentioned in previous articles or alerts about Zizians so far? Because I have only been on the periphery of rationalist events for the last several years, but in 2023 I can remember sending this[1] post about rationalist crazies into the San Antonio LW groupchat. A trans woman named Chase Carter, who doesn’t generally attend our meetups, began to argue with me that Ziz (who gets mentioned in the article as an example) was subject to a “disinformation campaign” by rationalists, her goals were actually extremely admirable, and her worst failure was a strategic one in not realizing how few people were like her in the world. At the next meetup we agreed to talk about it further, and she attended (I think for the first time) to explain a very sympathetic background of Ziz’s history and ideas. This was after the alert post but years before any of the recent events.
I have no idea if Chase actually self-identifies as a “Zizian” or is at all dangerous and haven’t spoken to her in a year and a half. I just mention her as an example; I haven’t heard her name brought up anywhere and I really wouldn’t expect to know any of these people to begin with on priors.
- ^
Misremembered that I sent the alert post into the chat, but actually it was the Habryka post about rationalist crazies.
- ^
I know you’re not endorsing the quoted claim, but just to make this extra explicit: running terrorist organizations is illegal, so this is the type of thing you would also say if Ziz was leading a terrorist organization, and you didn’t want to see her arrested.
I have Become Stronger