I’m currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
ozziegooen
Can you explain this position more? I know the bitter lesson, could imagine a few ways it could have implications here.
The second worry is, I guess, a variant of the first: that we’ll use intent-aligned AI very foolishly. That would be issuing a command like “”follow the laws of the nation you originated in but otherwise do whatever you like.” I guess a key consideration in both cases is whether there’s an adequate level of corrigibility.
I’d flag that I suspect that we really should have AI systems forecasting the future and the results of possible requests.
So if people made a broad request like, “follow the laws of the nation you originated in but otherwise do whatever you like”, they should see forecasts for what that would lead to. If there’s any clearly problematic outcomes, those should be apparent early on.
This seems like it would require either very dumb humans, or a straightforward alignment mistake risk failure, to mess up.
A bunch of people in the AI safety landscape seem to argue “we need to stop AI progress, so that we can make progress on AI safety first.”
One flip side to this is that I think it’s incredibly easy for people to waste a ton of resources on “AI safety” at this point.
I’m not sure how much I trust most technical AI safety researchers to make important progress on AI safety now. And I trust most institutions a lot less.
I’d naively expect if any major country would throw $100 Billion on it today, the results would be highly underwhelming. I rarely trust these governments to make progress on concrete technologies with clear progress measures, and “AI Safety” is highly ambiguous and speculative.
As I’ve written about before, I think it’s just hard to know what critical technical challenges will be bottlenecks around AI alignment, given that it’s unclear when this will become an issue or what sorts of architectures we will have then.
All that said, slowing things down seems much safer to me. I assume that at [year(TAI) − 3] we’ll have a decent idea of what’s needed, and extending that duration seems like a safe bet.
I really want to see better strategic discussion about AI safety. If somehow we could spend $10B just to get a better idea of what to actually do, I’d easily suggest that, though strategy is something that’s typically very difficult to spend money on.
Personally, this is one reason why I favor the meta approach of “make better epistemic tools, using AI.” This is an area that can be very concrete and achievable, though it does have its own problems.
There have been a few takes so far of humans gradually losing control to AIs—not through specific systems going clearly wrong, but rather by a long-term process of increasing complexity and incentives.
This sometimes gets classified as “systematic” failures—in comparison to “misuse” and “misalignment.”
There was “What Failure Looks Like”, and more recently, this piece on “Gradual Disempowerment.”
To me, these pieces come across as highly hand-wavy, speculative, and questionable.
I get the impression that a lot of people have strong low-level assumptions here that a world with lots of strong AIs must go haywire. But I don’t see clear steps to get there or a clear model of what the critical factors are.
As I see it, there are many worlds where AIs strictly outperform humans at managing high levels of complexity and increasing coordination. In many of these, things go toward much better worlds than ones with humans in charge.
I think it’s likely that inequality could increase, but that wouldn’t mean humanity as a whole would lose control.
My gut-level guess is that there are some crucial aspects here. Like, in worlds where AI systems have strong epistemics without critical large gaps, and can generally be controlled / aligned, things will be fine. But if there are fundamental gaps for technical or political reasons, then that could lead to these “systemic” disasters.
If that is the case, I’d expect we could come up with clear benchmarks to keep track of. For example, one might say that future global well-being is highly sensitive to a factor like, “how well the average-used AI service does at wisdom exam #523.”
https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like
https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from
Rather than generic slop, the early transformative AGI is fairly sycophantic (for the same reasons as today’s AI), and mostly comes up with clever arguments that the alignment team’s favorite ideas will in fact work.
I have a very easy time imagining work to make AI less sycophantic, for those who actually want that.
I expect that one major challenge for popular LLMs is that a large amount of sycophancy is both incredibly common online, and highly approved of by humans.
It seems like it should be an easy thing to stop for someone actually motivated. For example, take a request, re-write it in a bunch of ways that imply different things about the author’s take and interests, get the answers to all, and average them. There are a lot of clear evals we could do here.
To me, most of the question is how stupid these humans will be. Maybe Sam Altman will trust [LLM specifically developed in ways that give answers that Sam Altman would like], ignoring a lot of clear literature and other LLMs that would strongly advise otherwise.
So ultimately, this seems like a question of epistemics to me.
I think it’s totally fine to think that Anthropic is a net positive. Personally, right now, I broadly also think it’s a net positive. I have friends on both sides of this.
I’d flag though that your previous comment suggested more to me than “this is just you giving your probability”
> Give me your model, with numbers, that shows supporting Anthropic to be a bad bet, or admit you are confused and that you don’t actually have good advice to give anyone.
I feel like there are much nicer ways to phase that last bit. I suspect that this is much of the reason you got disagreement points.
Then we must consider probabilities, expected values, etc. Give me your model, with numbers, that shows supporting Anthropic to be a bad bet, or admit you are confused and that you don’t actually have good advice to give anyone.
Are there good models that support that Anthropic is a good bet? I’m genuinely curious.
I assume that naively, if any side had more of the burden of proof, it would be Anthropic. They have many more resources, and are the ones doing the highly-impactful (and potentially negative) work.
My impression was that there was very little probablistic risk modeling here, but I’d love to be wrong.
The introduction of the GD paper takes no more than 10 minutes to read
Even 10 minutes is a lot, for many people. I might see 100 semi-interesting Tweets and Hacker News posts that link to lengthy articles per day, and that’s already filtered—I definitely can’t spend 10 min each on many of them.
and no significant cognitive effort to grasp, really.
“No significant cognitive effort” to read a nuanced semi-academic article with unique terminology? I tried spending around ~20-30min understanding this paper, and didn’t find it trivial. I think it’s very easy to make mistakes about what papers like this are really trying to say (In many ways, the above post lists out a bunch of common mistakes, for instance). I know the authors and a lot of related work, and even with that, I didn’t find it trivial. I imagine things are severely harder for people much less close to this area.
By the way—I imagine you could do a better job with the evaluation prompts by having another LLM pass, where it formalizes the above more and adds more context. For example, with an o1/R1 pass/Squiggle AI pass, you could probably make something that considers a few more factors with this and brings in more stats.
Related Manifold question here:
That counts! Thanks for posting. I look forward to seeing what it will get scored as.
$300 Fermi Model Competition
I assume that what’s going on here is something like,
”This was low-hanging fruit, it was just a matter of time until someone did the corresponding test.”
This would imply that OpenAI’s work here isn’t impressive, and also, that previous LLMs might have essentially been underestimated. There’s basically a cheap latent capabilities gap.
I imagine a lot of software engineers / entrepreneurs aren’t too surprised now. Many companies are basically trying to find wins where LLMs + simple tools give a large gain.
So some people could look at this and say, “sure, this test is to be expected”, and others would be impressed by what LLMs + simple tools are capable of.
I feel like there are some critical metrics are factors here that are getting overlooked in the details.
I agree with your assessment that it’s very likely that many people will lose power. I think it’s fairly likely that most humans won’t be able to provide much economic value at some point, and won’t be able to ask for many resources in response. So I could see an argument for incredibly high levels of inequality.
However, there is a key question in that case, of “could the people who own the most resources guide AIs using those resources to do what they want, or will these people lose power as well?”
I don’t see a strong reason why these people would lose power or control. That would seem like a fundamental AI alignment issue—in a world where a small group of people own all the world’s resources, and there’s strong AI, can those people control their AIs in ways that would provide this group a positive outcome?
2. There are effectively two ways these systems maintain their alignment: through explicit human actions (like voting and consumer choice), and implicitly through their reliance on human labor and cognition. The significance of the implicit alignment can be hard to recognize because we have never seen its absence.
3. If these systems become less reliant on human labor and cognition, that would also decrease the extent to which humans could explicitly or implicitly align them. As a result, these systems—and the outcomes they produce—might drift further from providing what humans want.
There seems to be a key assumption here that people are able to maintain control because of the fact that their labor and cognition is important.
I think this makes sense for people who need to work for money, but not for those who are rich.
Our world has a long history of dumb rich people who provide neither labor nor cognition, and still seem to do pretty fine. I’d argue that power often matters more than human output, and would expect the importance of power to increase over time.
I think that many rich people now are able to maintain a lot of control, with very little labor/cognition. They have been able to decently align other humans to do things for them.
I think my quick guess is that what’s going on is something like:
- People generally have a ton of content to potentially consume and limited time, and are thus really picky.
- Researchers often have unique models and a lot of specific nuanced they care about.
- Most research of this type is really bad. Tons of people on Twitter now seem to have some big-picture theory of what AI will do to civilization.
- Researchers also have the curse of knowledge, and think their work is simpler than it is.
So basically, people aren’t flinching because of bizarre and specific epistemic limitations. It’s more like,
> “this seems complicated, learning it would take effort, my prior is that this is fairly useless anyway, so I’ll be very quick to dismiss this.”
My quick impression is that this is a brutal and highly significant limitation of this kind of research. It’s just incredibly expensive for others to read and evaluate, so it’s very common for it to get ignored. (Learned in part from myself trying to put a lot of similar work out there, then seeing it get ignored)
Related to this -
I’d predict that if you improved the arguments by 50%, it would lead to little extra uptake. But if you got someone really prestigious to highly recommend it, then suddenly a bunch of people would be much more convinced.
I like reading outsider accounts of things I’m involved in / things I care about.
Just for context for some not aware—The author, Ben Landau-Taylor, has been in the rationalist-extended community for some time now. This post is written on Palladium Magazine, which I believe basically is part of Samo Burja’s setup. I think both used to be around Leverage Research and some other rationality/EA orgs.
Ben and Samo have been working on behalf of Palladium and similar for a while now.
My quick read is that this article is analogous to similar takes they’ve written about/discussed before, which is not too much of a surprise.
I disagree with a lot of their intuitions, but at the same time, I’m happy to have more voices discuss some of these topics.
All this to say, while these people aren’t exactly part of the scene now, they’re much closer to it than what many might imagine as “outsider accounts.”
I think it’s telling that prediction markets and stock markets haven’t seem to update that much since R1′s release
Welp. I guess yesterday proved this part to be almost embarrassingly incorrect.
I’m somewhere between the stock market and the rationalist/EA community on this.
I’m hesitant to accept a claim like “rationalists are far better at the stock market than other top traders”. I agree that the general guess “AI will do well” generally was more correct than the market, but it was just one call (in which case luck is a major factor), and there were a lot of other calls made there that aren’t tracked.
I think we can point to many people who did make money, but I’m not sure how much this community made on average.
Manifold traders typically give a 27% chance of 500B actually being deployed in 4 years. There’s also a more interesting market of more precisely how much will be.
I get the impression that Trump really likes launching things with big numbers, and cares much less about the details or correctness.
That said, it’s possible that the government’s involvement still increases spending by 20%+, which would still be significant.
I assume that current efforts in AI evals and AI interpretability will be pretty useless if we have very different infrastructures in 10 years. For example, I’m not sure how much LLM interp helps with o1-style high-level reasoning.
I also think that later AI could help us do research. So if the idea is that we could do high-level strategic reasoning to find strategies that aren’t specific to specific models/architectures, I assume we could do that reasoning much better with better AI.