Eli Tyre

Karma: 7,599

Eli Tyre Jun 14, 2025, 4:51 AM
2 points
0
in reply to: Mitchell_Porter’s comment on: Eli’s shortform feed
Better medical tech, better entertainment, various new technologies that start out as trivialities but quickly become essential to people’s lives (like the cell phone).

Eli Tyre Jun 12, 2025, 7:29 PM
7 points
0
in reply to: habryka’s comment on: Eli’s shortform feed
Just as a specific prediction, does this mean you expect we will very substantially improve the cheating/lying behavior of current RL models?
I disown this prediction as “mine”, more like the prediction of one facet of me. But yeah, that facet is definitely expecting to see visible improvements in the lying and cheating behavior of reasoning models over the next few years.

Eli Tyre Jun 12, 2025, 3:45 PM
3 points
−2
in reply to: quanticle’s comment on: Eli’s shortform feed
I feel like this overindexes on the current state of AI.
No?

I’m not saying future AI agents will be obedient because current AI agents are. I’m saying that they will be obedient because failures of obedience hurt their commercial value a lot and so market pressures will either solve the problem or try very hard and legibly fail to get much traction.

Eli Tyre Jun 12, 2025, 3:43 PM
6 points
2
in reply to: Kaarel’s comment on: Eli’s shortform feed
I’m pretty sure there is such a thing as technological maturity, in which either, there are knowably no new discoveries to be found, or there are more innovations to discover, but the expected value of doing the search to find those innovations doesn’t beat opportunity cost of just exploiting known mechanisms.

Eli Tyre Jun 12, 2025, 3:39 PM
6 points
2
in reply to: Mateusz Bagiński’s comment on: Eli’s shortform feed
Absolutely. But my “what feels like the genera of reality” generator runs out at that point.

Eli Tyre Jun 12, 2025, 4:55 AM
48 points
3
on: Eli’s shortform feed
This post is a snapshot of what currently “feels realistic” to me regarding how AI will go. That is, these are not my considered positions, or even provisional conclusions informed by arguments. Rather, if I put aside all the claims and arguments and just ask “which scenario feels like it is ‘in the genera of reality’?”, this is what I come up with. I expect to have different first-order impressions in a month.
Crucially, none of the following is making claims about the intelligence explosion, and the details of the intelligence explosion (where AI development goes strongly recursive) are crucial to the long run equilibrium of the earth-originating civilization.
My headline: we’ll mostly succeed at prosaic alignment of human-genius level AI agents
- Takeoff will continue to be gradual. We’ll get better models and more capable agents year by year, but not jumps that are bigger than that between Claude 3.7 and Claude 4.
- Our behavioral alignment patches will work well enough.
  - RL will induce all kinds of reward hacking and related misbehavior, but we’ll develop patches for those problems (most centrally, for any given reward hack, we’ll generate some examples and counter examples to include in the behavior training regimes).
  - (With a little work) these patches will broadly generalize. Future AI agents won’t just not cheat at chess and won’t just abstain from blackmail. They’ll understand the difference between “good behavior” and “bad behavior”, and their behavioral training will cause them to act in accordance with good behavior. When they see new reward hacks, including ones that humans wouldn’t have thought of, they’ll correctly extrapolate their notion of “good behavior” to preclude this new reward hack as well.
  - I expect that the AI labs will figure this out, because “not engaging in reward-hacking-like shenanigans” is critical to developing generally reliable AI agents. The AI companies can’t release AI agent products for mass consumption if those agents are lying and cheating all over the place.^[1]
  - Overall, the AI agents will be very obedient. They’ll have goals, in so far as accomplishing any medium term task entails steering towards a goal, but they won’t have persistent goals of their own. They’ll be obedient assistants and delegates that understand what humans want and broadly do what humans want.
- The world will get rich. LessWrong style deceptive misalignment concerns will seems increasingly conspiracy-ish and out of touch. Decision makers will not put much stock on such concerns—they’ll be faced with a choice to forgo enormous and highly tangible material benefits (and ceding those benefits to their rivals), on the basis of abstract concerns which have virtually no empirical examples, and whose advocates explicitly state are unfalsifiable.
- There’s a gold rush to get the benefits before others. The world is broadly in a “greedy” mode and not a “fearful” mode. The labs, and relevant governments eagerly unleash their genius level AI agents to automate AI R&D. At this point something even stranger happens.
1. ^
  Though a friend points out that companies might develop mechanisms for utilizing cheap AI labor, tested incentive and affordance schemes, designed specifically to contend with the Agents propensity for misbehavior. Just because the average person can’t trust an AI to do their taxes or watch their kids doesn’t mean that there are not enterprising business men that won’t find a way to squeeze useful outputs from untrustworthy AIs.

Eli Tyre May 8, 2025, 7:05 PM
4 points
2
in reply to: KvmanThinking’s comment on: KvmanThinking’s Shortform
This has the obvious problem that an AI will then be indifferent between astronomical suffering and oblivion. In ANY situation where it will need to choose between those two, it will not care about which occurs on the merits, not just blackmail situations.

You don’t want your AI to prefer a 99.999% chance of astronomical suffering to a 99.9999% of oblivion. Astronomical suffering is much worse.

Eli Tyre May 6, 2025, 11:43 PM
4 points
0
on: Blue light, ‘Adrenal ASMR’: strange experiences I can’t find any literature about
What is the blue lamp, so that other pepole can try to replicate?

Eli Tyre Apr 30, 2025, 10:07 PM
3 points
0
on: Eli’s shortform feed
Is it true that no one knows why Claude 3 Opus (but not other Claude models) has strong behavioral dispositions about animal welfare?

Eli Tyre Apr 27, 2025, 8:05 AM
2 points
0
in reply to: Rafael Harth’s comment on: ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
You mean that the human attention mechanism is the assessor?

Eli Tyre Apr 26, 2025, 9:30 PM
2 points
0
in reply to: Rafael Harth’s comment on: ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
Do you have a pointer for why you think that?

My (admittedly weak) understanding of the neuroscience doesn’t suggest that there’s a specialized mechanism for critique of prior thoughts.

Eli Tyre Apr 26, 2025, 4:58 AM
2 points
0
on: Views on when AGI comes and on strategy to reduce existential risk
I’m kind of baffled that people are so willing to say that LLMs understand X, for various X. LLMs do not behave with respect to X like a person who understands X, for many X.
Do you have two or three representative examples?

Eli Tyre Apr 26, 2025, 4:54 AM
2 points
0
on: Views on when AGI comes and on strategy to reduce existential risk
In particular, even if the LLM were being continually trained (in a way that’s similar to how LLMs are already trained, with similar architecture), it still wouldn’t do the thing humans do with quickly picking up new analogies, quickly creating new concepts, and generally reforging concepts.
Is this true? How do you know? (I assume there’s some facts here about in-context learning that I just happen to not know.)
It seems like eg I can teach an LLM a new game in one session, and it will operate within the rules of that game.

Eli Tyre Apr 26, 2025, 3:16 AM
4 points
2
on: ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
Remember that we have no a priori reason to suspect that there are jumps in the future; humans perform sequential reasoning differently, so comparisons to the brain are just not informative.
In what way do we do it differently than the reasoning models?

Eli Tyre Apr 19, 2025, 5:21 PM
2 points
0
in reply to: TsviBT’s comment on: TsviBT’s Shortform
@Valentine comes to mind as a person who was raised lifeist and is now still lifeist, but I think has more complicated feelings/views about the situation related to enlightenment and metaphysics that make death an illusion, or something.

Eli Tyre Apr 18, 2025, 6:09 AM
2 points
0
in reply to: habryka’s comment on: Eli’s shortform feed
Of course the default outcome of doing finetuning on any subset of data with easy-to-predict biases will be that you aren’t shifting the inductive biases of the model on the vast majority of the distribution. This isn’t because of an analogy with evolution, it’s a necessity of how we train big transformers. In this case, the AI will likely just learn how to speak the “corrigible language” the same way it learned to speak french, and this will make approximately zero difference to any of its internal cognition, unless you are doing transformations to its internal chain of thought that substantially change its performance on actual tasks that you are trying to optimize for.
This is a pretty helpful answer.

(Though you keep referencing the AI’s chain of thought. I wasn’t imagining training over the chain of thought. I was imagining training over the AI’s outputs, whatever those are in the relevant domain.)

Eli Tyre Apr 18, 2025, 6:06 AM
2 points
0
in reply to: habryka’s comment on: Eli’s shortform feed
Would you expect that if you trained an AI system on translating its internal chain of thought into a different language, that this would make it substantially harder for it to perform tasks in the language in which it was originally trained in?
I would guess that if you finetuned a model so that it always responded in French, regardless of the languge you prompt it with, it would persistently respond in French (absent various jailbreaks which would almost definitely exist).

Eli Tyre Apr 17, 2025, 8:32 PM
2 points
0
in reply to: Kaj_Sotala’s comment on: Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
I’m not sure that I share that intuition, I think because my background model of humans has them as much less general than I imagine yours does.

Eli Tyre Apr 17, 2025, 7:49 PM
5 points
0
on: Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Fascinating and useful post.

Thank you for writing it.

Eli Tyre Apr 17, 2025, 7:23 PM
13 points
1
on: Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
In my experience, this is a common kind of failure with LLMs—that if asked directly about how to best a solve problem, they do know the answer. But if they aren’t given that slight scaffolding, they totally fail to apply it.
Notably, this is also true of almost all humans, at least of content that they’ve learned in school. The literature on transfer learning is pretty dismal in this respect. Almost all students will fail to apply their knowledge to new domains without very explicit prompting.