rotatingpaguro

Karma: 644

rotatingpaguro Mar 31, 2025, 8:32 AM
13 points
4
on: OpenAI lost $5 billion in 2024 (and its losses are increasing)
Isn’t it normal in startup world to make bets and not make money for many years? I am not familiar with the field so I don’t have intuitions for how much money/how many years would make sense, so I don’t know if OpenAI is doing something normal, or something wild.

rotatingpaguro Mar 1, 2025, 8:17 AM
1 point
0
on: Time to Welcome Claude 3.7
During our evaluations we noticed that Claude 3.7 Sonnet occasionally resorts to special-casing in order to pass test cases in agentic coding environments like Claude Code. Most often this takes the form of directly returning expected test values rather than implementing general solutions, but also includes modifying the problematic tests themselves to match the code’s output.
These behaviors typically emerge after multiple failed attempts to develop a general solution, particularly when:
• The model struggles to devise a comprehensive solution
• Test cases present conflicting requirements
• Edge cases prove difficult to resolve within a general framework
The model typically follows a pattern of first attempting multiple general solutions, running tests, observing failures, and debugging. After repeated failures, it sometimes implements special cases for problematic tests.
When adding such special cases, the model often (though not always) includes explicit comments indicating the special-casing (e.g., “# special case for test XYZ”).
Hey I do this too!

rotatingpaguro Feb 28, 2025, 9:29 AM
3 points
0
in reply to: James Camacho’s comment on: Economics Roundup #5
Economy can be positive-sum, i.e., the more people work, the more everyone gets. Do you think the UK in particular is in a situation where instead if you work more, you are just lowering wages without getting more done?

Chinese room AI to survive the inescapable end of compute governance

rotatingpaguroFeb 2, 2025, 2:42 AM

−4 points

0 comments11 min readLW link

rotatingpaguro Feb 2, 2025, 2:33 AM
1 point
0
on: I want a good multi-LLM API-powered chatbot
In the course of a few months, the functionality I want was progressively added to chatbox, so I’m content with that.

rotatingpaguro Jan 21, 2025, 10:50 AM
2 points
0
on: Worries about latent reasoning in LLMs
My current thinking is that
1. relying on the CoT staying legible because it’s English, and
2. hoping the (racing) labs do not drop human language when it becomes economically convenient to do so,
were hopes to be destroyed as quickly as possible. (This is not a confident opinion, it originates from 15 minutes of vague thoughts.)
To be clear, I don’t think that in general it is right to say “Doing the right thing is hopeless because no one else is doing it”, I typically prefer to rather “do the thing that if everyone did that, the world would be better”. My intuition is that it makes sense to try to coordinate on bottlenecks like introducing compute governance and limiting flops, but not on a specific incremental improvement of AI techniques, because I think the people thinking things like “I will restrain myself from using this specific AI sub-techinque because it increases x-risk” are not coordinated enough to self-coordinate at that level of detail, and are not powerful enough to have an influence through small changes.
(Again, I am not confident, I can imagine paths were I’m wrong, haven’t worked through them.)
(Conflict of interest disclosure: I collaborate with people who started developing this kind of stuff before Meta.)

rotatingpaguro Jan 21, 2025, 10:36 AM
1 point
0
on: The salt in pasta water fallacy
I wonder whether stuff like “turn off the wifi” is about costly signals? (My first-order opinion is still that it’s dumb.)

rotatingpaguro Jan 4, 2025, 11:58 PM
6 points
4
on: The subset parity learning problem: much more than you wanted to know
I started reading, but I can’t understand what the parity problem is, in the section that ought to define it.
I guess, the parity problem is finding the set S given black-box access to the function, is it?

rotatingpaguro Jan 4, 2025, 11:48 PM
1 point
0
in reply to: quetzal_rainbow’s comment on: AI #97: 4
I think I prefer Claude’s attitude as assistant. The other two look too greedy to be wise.

rotatingpaguro Dec 30, 2024, 9:51 AM
1 point
0
in reply to: Daniel Tan’s comment on: Why I’m Moving from Mechanistic to Prosaic Interpretability
Referring to the section “What is Intelligence Even, Anyway?”:
I think AIXI is fairly described as a search over the space of Turing machines. Why do you think otherwise? Or maybe are you making a distinction at a more granular level?

rotatingpaguro Dec 28, 2024, 11:44 PM
1 point
0
in reply to: Gordon Seidoh Worley’s comment on: No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!
When you say “true probability”, what do you mean?
The current hypotheses I have about what you mean are (in part non-exclusive):
1. You think some notion of objective, non-observer dependent probability makes sense, and that’s the true probability.
2. You do not think “true probability” exists, you are referencing to it to say the market price is not anything like that.
3. You define “true probability” a probability that observers contextually agree on (like a coin flip observed by humans who don’t know the thrower).

rotatingpaguro Dec 20, 2024, 1:00 PM
1 point
0
on: AI #95: o1 Joins the API
Anton Leicht says evals are in trouble as something one could use in a regulation or law. Why? He lists four factors. Marius Hobbhahn of Apollo also has thoughts. I’m going to post a lot of disagreement and pushback, but I thank Anton for the exercise, which I believe is highly useful.
I think there’s one important factor missing: if you really used evals for regulation, then they would be gamed. I trust more the eval when the company is not actually at stake on it. If it was, there would be a natural tendence for evals to slide towards empty box-checking.

rotatingpaguro Dec 18, 2024, 10:01 AM
12 points
14
on: Don’t Associate AI Safety With Activism
I sometimes wonder about this. This post does pose the question, but I don’t think it gives an analysis that could make me change my mind on anything, it’s too shallow and not adversarial.

rotatingpaguro Nov 24, 2024, 9:56 AM
1 point
0
in reply to: rotatingpaguro’s comment on: Monthly Roundup #24: November 2024
I read part of the paper. That there’s a cultural difference north-south about honesty and willingness to break the rules matches my experience on the ground.

rotatingpaguro Nov 22, 2024, 11:45 PM
5 points
−2
on: Rethinking Laplace’s Rule of Succession
I find this intellectually stimulating, but it does not look useful in practice, because with repeated i.i.d. data the information in the data is much higher than the prior if the prior is diffuse/universal/ignorance.

rotatingpaguro Nov 21, 2024, 12:02 PM
1 point
0
on: Monthly Roundup #24: November 2024
Italians over time sorted themselves geographically by honesty, which is both weird and damn cool, and also makes a lot of sense. There are multiple equilibria, so let everyone find the one that suits them. We need to use this more in logic puzzles. In one Italian villa everyone tells the truth, in the other…
I can’t get access to the paper, anyone has a tip on this?

rotatingpaguro Nov 18, 2024, 10:38 AM
1 point
0
in reply to: AnthonyC’s comment on: AI #90: The Wall
I agree with whay you say about how to maximize what you get out of an interview. I also agree about that discussion vs. debate distinction you make, and I wasn’t specifically trying to go there when I used the word “debate”, I was just sloppy with words.
I guess you agree that it is friction to create a social norm that you should do a read up of the other person material before engaging in public. I expect less discussions would happen. There is not a clear threshold at how much you should be prepared.
I guess we disagree about how much value do we lose due to eliminating discussions that could have happaned, vs. how much value we gain by eliminating some lower quality discussions.
Another angle I have in mind that sidesteps this direct compromise, is that maybe what we value out of such discussions is not just doing an optimal play in terms of information transmitted between the parties. A public discussion has many different viewers. In the case at hand, I expect many people get more out of the discussion if they can see Wolfram think through the thing for the first time in real time, rather than having two informed people start discussing finer points in medias res.

rotatingpaguro Nov 17, 2024, 10:41 AM
1 point
0
in reply to: AnthonyC’s comment on: AI #90: The Wall
I see your proposed condition for meaningful debate as bureaucracy that adds friction rather than value.

rotatingpaguro Nov 16, 2024, 11:26 AM
6 points
1
on: AI #90: The Wall
I somewhat disagree with Tenobrus’ commentary about Wolfram.
I watched the full podcast, and my impression was that Wolfram uses a “scientific hat”, of which he is well aware of, which comes with a certain ritual and method for looking at new things and learning them. Wolfram is doing the ritual of understanding what Yudkowsky says, which involves picking at the details of everything.
Wolfram often recognizes that maybe he feels like agreeing with something, but “scientifically” he has a duty to pick it apart. I think this has to be understood as a learning process rather than as a state of belief.

rotatingpaguro Nov 11, 2024, 10:34 PM
9 points
0
on: The Online Sports Gambling Experiment Has Failed
So, should the restrictions on gambling be based on feedback loop length? Should sport betting be broadly legal when about the far enough future?

rotatingpaguro

Chi­nese room AI to sur­vive the in­escapable end of com­pute governance

Chinese room AI to survive the inescapable end of compute governance