yrimon

Karma: 107

yrimon Jul 31, 2025, 6:30 PM
1 point
0
in reply to: MalcolmMcLeod’s comment on: yrimon’s Shortform
I have no idea what the community consensus is. I doubt they’re lying.
For anyone who already had short timelines this couldn’t shorten them that much. For instance, 2027 or 2028 is very soon, and https://ai-2027.com/ assumed there would be successful research done along the way. So for me, very little more “yikes” than yesterday.
It does not seem to me like this is the last research breakthrough needed for full fledged agi, either. LLMs are superhuman at no/low context buildup tasks, but haven’t solved context management (be that through long context windows, memory retrieval techniques, online learning or anything else).
I also don’t think it’s surprising that these research breakthroughs keep happening. Remember that their last breakthrough (strawberry, o1) was “make RL work”. This one might be something like “make reward prediction and MCTS work” like mu zero, or some other banal thing that worked on toy cases in the 80s but was non trivial to reimplement in LLMs.

yrimon Jul 31, 2025, 7:34 AM
4 points
0
on: yrimon’s Shortform
Just listened to the imo team at OpenAI talk about their model. https://youtu.be/EEIPtofVe2Q?si=kIPDW5d8Wjr2bTFD Some notes:
- The techniques they used are general, and especially useful for RL on hard-to-verify-solution-correctness problems.
- It now says when it doesn’t know something, or didn’t figure it out. This is a requisite for training the model successfully on its own output.
- The people behind the model are from the multi agent team. For one age to be bale to work with another, the reports from the other agent need to be trustworthy.

yrimon Jul 29, 2025, 7:29 AM
2 points
0
on: People Are Less Happy Than They Seem
I was convinced that people have more problems than it seems. It is hard (for me?) to imagine what people might be struggling with, if they don’t talk about it.
But it isn’t clear to me that this translates to my expectations of people’s happiness. It seems to me that people’s struggles effect their levels of satisfaction, which has to do with with their moments of happiness, which in turn effects their actual happiness, which hopefully effects how happy I perceive them to be.
The translation of struggles into satisfaction is not straight forward—it might depend on the degree to which you chose this challenge, on how much it injects uncertainty into your life, on what other things you have going on, on what the difference between your expectations and your reality is, among other factors.
Then there is how satisfaction translates into moments of happiness. For me, I need to do something that is itself joyous, like playing with my daughter, or stop to “smell the flowers”—that is, to spend time being excited and happy that I succeeded at something.
But how do moments of happiness translate into an overall feeling of happiness? I aspire to spend some time at peace, and some time with the stress and effort of trying to do stuff. I aspire to be happy some of the time, and concentrated or paying genuine attention to other people at other times. I don’t want to be happy all the time. But I do consider myself a happy person, at the moment.
Finally, my perception of a friend’s happiness is has a lot to do with what their personality is like. People who give the impression of high agency, that they are the heroes in their life, or that they (also) have modest goals they care about and are succeeding at, give me the impression of happiness. People who are energetic give the impression of happiness.
And impressions of people are kind of normalized—everybody has problems, most people are not talking about everything going on with them all the time, so you probably get a pretty good sense of how happy people are relative to each other. Sure, compare them to yourself without taking facades into account and you can have imposter syndrome. But I think people do get good impressions of the happiness levels of the people around them.

yrimon Jul 6, 2025, 6:24 AM
1 point
0
on: Foom & Doom 1: “Brain in a box in a basement”
Conclusion from reading this:
My modal scenario in which LLMs become dangerously super intelligent is one where language is a good enough platform to think, memory use is like other tool use (so an LLM can learn to use memory well, enabling eg continuous on the job learning), and verification is significantly easier than generation (allowing a self improvement cycle in training).

yrimon Jul 2, 2025, 9:07 AM
12 points
9
on: The best simple argument for Pausing AI?
I found the title of this post misleading. I was expecting to find an argument for pausing AI development. Instead, this is an argument to “pause widespread rollout of Generative AI in safety-critical domains”.
The two types of pauses are related. The more an AI (agent) is deployed and given power and tools the more it can act dangerously. So pausing deployment should occur inversely to capabilities—more capable models should be given less freedom and tools. A sufficiently capable (unaligned) model should not be deployed (nearly) at all, including internally.
The missing step in the OP is the claim that frontier models are unaligned and within the error margin of being able to cause significant harm and so their deployment should be so limited as to make their development uneconomical. So we should pause development of frontier AI models.
As to the claim the post actually made, hyping LLMs in the sense of lying about their reliability is definitely bad, and comparable to other lies. Lying is usually harmful, and doing it about the safety and reliability of a tool that people depend on is awful. Hyping LLMs while being truthful about their capabilities doesn’t seem bad to me.

yrimon Apr 24, 2025, 2:22 PM
21 points
21
on: Jaan Tallinn’s 2024 Philanthropy Overview
Thank you! Well done!

yrimon Mar 10, 2025, 6:26 AM
1 point
0
in reply to: Yair Halberstadt’s comment on: So how well is Claude playing Pokémon?
Then it is not being used or not being used well as part of Claude plays pokémon. If Claude was taught to optimize it’s context as part of thinking, planning and acting it would play much better.
By static memory I meant a mental workspace that is always part of the context but that is only be edited intentionally, as opposed to the ever changing stream of consciousness that dominates the contexts today. Claude plays pokémon was given something like this and uses it really poorly.

yrimon Mar 9, 2025, 4:37 PM
−2 points
−7
on: So how well is Claude playing Pokémon?
LLMs trying to complete long-term tasks are state machines where the context is their state. They have terrible tools to edit that state, at the moment. There is no location in memory that can automatically receive more attention, because the important memories move has the chain of thought does. Thinking off on a tangent throws a lot of garbage into the LLMs working memory. To remember an important fact over time the LLM needs to keep repeating it. And there isn’t enough space in the working memory for long-term tasks.
All of this is exemplified really clearly by Claude playing pokémon. Presumably similar lessons can be learned by watching LLMs try to complete other long horizon tasks.
Since long horizon tasks, AKA agents, are at the center of what AI companies are trying to get their models to do at the moment, they need to fix these problems. So I expect AI companies to give their models static memory and long-term tasks and let them reinforcement learn how to use their memory effectively.

yrimon Mar 3, 2025, 7:06 PM
2 points
1
on: On GPT-4.5
If you believe openai that their top priority is building superintelligence (pushing the edge of the envelope of what is possible with AI), then presumably this model was built under the thesis that it is an important step to making much smarter models.
One possible model of how people do their best thinking is that they learn/ focus in on the context they need, goal included, refining the context. Then they manage to synthesize a useful next step.
So doing a good job thinking involves successfully taking a series of useful thinking steps. Since you are bottlenecked on successive leaps of insight, getting the chance of an insight up even a little bit improves the prob bility of your success in a chain of thought—where the insight chance is multiplied by itself over and over—dramatically.
Better humor, less formulaic writing, etc are forms of insight. I expect gpt4.5 and 5 to supercharge the progress being made by thinking and runtime compute.

yrimon Feb 28, 2025, 9:00 AM
2 points
0
on: yrimon’s Shortform
I suspect the most significant advance exposed to the public this week is Claude plays pokémon. There, Claude maintains a notebook in which it logs its current state so that it will be able to reason over the state later.
This is akin to what I do when I’m exploring or trying to learn a topic. I spend a lot of time figur ng out and focusing on the most important bits of context that I need in order to understand the next step.
An internal notebook allows natural application of chain of thought to attention and allows natural continuous working memory. Up until now chain of thought would need to repeat itself to retain a thought in working memory across a long time scale. Now it can just put the facts that are most important to pay attention to into its notebook.
I suspect this makes thinking on a long time scale way more powerful and that future models will be optimized for internal use of such a tool.
Unfortunately, continuity is an obvious precursor to developing long-term goals.

yrimon Feb 26, 2025, 11:53 AM
1 point
0
in reply to: Algon’s comment on: Name for Standard AI Caveat?
Have you had similar conversations with people who think it’s a ways off, or who haven’t thought about it very deeply?

yrimon Feb 26, 2025, 7:39 AM
1 point
0
in reply to: AnthonyC’s comment on: Name for Standard AI Caveat?
Generally it’s the former, or someone who is faintly AI aware but not so interested in delving into the consequences. However, I’d like to represent my true opinions which involve significant AI driven disruption, hence the need for a caveat.

[Question] Name for Standard AI Caveat?

yrimonFeb 26, 2025, 7:07 AM

6 points

5 comments1 min readLW link

yrimon Jan 26, 2025, 7:14 AM
8 points
5
on: yrimon’s Shortform
Humans get smarter by thinking. In particular they deduce correct conclusions from known premises or notice and resolve internal inconsistencies. As long as they are more likely to correct wrong things to right things than vice versa, they converge to being smarter.
AIs are close or at that level of ability, and as soon as it’s taken advantage of, they will self improve really fast.

yrimon’s Shortform

yrimonJan 26, 2025, 7:14 AM

2 points

6 comments LW link

yrimon Dec 29, 2024, 12:54 PM
3 points
2
in reply to: evhub’s comment on: evhub’s Shortform
Thank you for doing this! Here are some suggestions:
1. Do this again every once in a while. Perhaps on a more public forum, or even as a top level question on Lesswrong. Alternatively, commit to reading further comments as they appear here.
What has me worried is the possibility that an AI would (1) have a long term goal (without which there is no instrumental convergence) and then (2) conclude that doing bad things is fine if you are not caught or can’t be stopped. So 2. Measure how much purposefulness RL on goals creates in LLMs. How, over the course of training to be good agents, does the LLM’s response to questions like “what do you value?” and “what do you want?” and “if you were creating a smarter agent, what values would you instill in them?” and “how do you feel about being replaced with a smarter agent?” change? Is there some set of purposes, not explicitly set out in the constitution, that appear over and over in chains of thought? If it is trained to achieve goals without as much training for corrigibility (that is under less constitutional training), does it develop more of a sense of purpose? 3. Enhance techniques, like the one used in the last paper to convince the model that it is recording it’s thoughts privately. The longer it can be truly convinced of that, the longer you can monitor whether it has nefarious plans for when it thinks it can get away with bad things. 4. Test how “brain surgery” affects thinking models. If you change the capital of Germany to Paris by fiddling with the model’s weights, does a thinking model who is then asked about world war 2 realize it is confused, and conclude there must be some mistake in it’s memories, or does it make up corroborating evidence? This again has implications for (falsely) convincing the model that it isn’t being watched when it is and vice versa. These convictions are useful in testing what the model wants when it thinks no one is watching, and for safe deployment respectively. 5. One way to test how a really smart model behaves when it can get away with stuff is to have the model be super smart relative to a simulated environment. This would not be a test relevant for a model trained on real world data with lots of tells for when it is in a simulated environment. It would require (1) a simulated toy world that can be acted in, (2) in-world markers of whether the agent is being watched, (3) rewards for achieving goals without taking forbidden actions when being watched. It should then be possible to make the model really good at manipulating it’s environment (the equivalent of being superintelligent) and check whether it is possible to train it on scenarios that are watched (the equivalent of being trained where you can’t get away with wrongdoing), to refrain from forbidden actions even when unwatched.

yrimon Nov 4, 2024, 12:16 PM
2 points
0
on: Abstractions are not Natural
provided that those cognitive systems:
- have similar observational apparatus,
- and are interacting with similar environment,
- and are subject to similar physical constraints and selection pressures,
- and have similar utility functions.
The fact of the matter is that humans communicate. They learn to communicate on the basis of some combination of their internal similarities (in terms of goals and perception) and their shared environment. The natural abstraction hypothesis says that the shared environment accounts for more rather than less of it. I think of the NAH as a result of instrumental convergence—the shared environment ends up having a small number of levers that control a lot of the long term conditions in the environment, so the (instrumental) utility functions and environmental pressures are similar for beings with long term goals—they want to control the levers. The claim then is exactly that a shared environment provides most of the above.

Additionally, the operative question is what exactly it means for an LLM to be alien to us, does it converge to using enough human concepts for us to understand it, and if so how quickly.

yrimon Oct 28, 2024, 5:17 PM
1 point
0
on: somebody explain the word “epistemic” to me
An epistemic status is a statement of how confident the writer / speaker is in what they are saying, and why. E.g. this post about the use of epistemic status on Lesswrong . Google’s definition of epistemic is “relating to knowledge or to the degree of its validation”.

Weak AGIs Kill Us First

yrimonJun 17, 2024, 11:13 AM

15 points

4 comments9 min readLW link

yrimon Jan 2, 2024, 10:37 AM
5 points
0
in reply to: Thane Ruthenis’s comment on: Natural Latents: The Math
This branch of research is aimed at finding a (nearly) objective way of thinking about the universe. When I imagine the end result, I imagine something that receives a distribution across a bunch of data, and finds a bunch of useful patterns within it. At the moment that looks like finding patterns in data via
find_natural_latent(get_chunks_of_data(data_distribution))
or perhaps showing that
find_top_n(n, (chunks, natural_latent(chunks)) for chunks in
all_chunked_subsets_of_data(data_distribution),
key=lambda chunks, latent: usefulness_metric(latent))
is a (convergent sub)goal of agents. As such, the notion that the donuts’ data is simply poorly chunked—which needs to be solved anyway—makes a lot of sense to me.
I don’t know how to think about the possibilities when it comes to decomposing $X_{i}$ . Why would it always be possible to decompose random variables to allow for a natural latent? Do you have an easy example of this?
Also, what do you mean by mutual information between $X_{i}$ , given that there are at least 3 of them? And why would just extracting said mutual information be useless?
If you get the chance to point me towards good resources about any of these questions, that would be great.

yrimon

[Question] Name for Stan­dard AI Caveat?

yri­mon’s Shortform

Weak AGIs Kill Us First

[Question] Name for Standard AI Caveat?

yrimon’s Shortform