I’m currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
ozziegooen
I’m somewhere between the stock market and the rationalist/EA community on this.
I’m hesitant to accept a claim like “rationalists are far better at the stock market than other top traders”. I agree that the general guess “AI will do well” generally was more correct than the market, but it was just one call (in which case luck is a major factor), and there were a lot of other calls made there that aren’t tracked.
I think we can point to many people who did make money, but I’m not sure how much this community made on average.
Manifold traders typically give a 27% chance of 500B actually being deployed in 4 years. There’s also a more interesting market of more precisely how much will be.
I get the impression that Trump really likes launching things with big numbers, and cares much less about the details or correctness.
That said, it’s possible that the government’s involvement still increases spending by 20%+, which would still be significant.
Instead, we seem to be headed to a world where
- Proliferation is not bottlenecked by infrastructure.
- Regulatory control through hardware restriction becomes much less viable.
I like the rest of your post, but I’m skeptical of these specific implications.
Even if everyone has access to the SOTA models, some actors will have much more hardware to run on them, and I expect this to matter. This does make the offense/defense balance more weighted on the offense side, arguably, but there are many domains where extra thinking will help a lot.
More generally, and I hate-to-be-that-guy, but I think it’s telling that prediction markets and stock markets haven’t seem to update that much since R1′s release. I think it’s generally easy to get hyped up over whatever is the latest thing, and agree that R1 is really neat, but am skeptical of how much it really should cause us to update, in the scheme of things.
I found this extra information very useful, thanks for revealing what you did.
Of course, to me this makes OpenAI look quite poor. This seems like an incredibly obvious conflict of interest.
I’m surprised that the contract didn’t allow Epoch to release this information until recently, but that it does allow Epoch to release the information after. This seems really sloppy for OpenAI. I guess they got a bit extra publicity when o3 was released (even though the model wasn’t even available), but now it winds up looking worse (at least for those paying attention). I’m curious if this discrepancy was maliciousness or carelessness.Hiding this information seems very similar to lying to the public. So at very least, from what I’ve seen, I don’t feel like we have many reasons to trust their communications—especially their “tweets from various employees.”
> However, we have a verbal agreement that these materials will not be used in model training.
I imagine I can speak for a bunch of people here when I can say I’m pretty skeptical. At very least, it’s easy for me to imagine situations where the data wasn’t technically directly used in the training, but was used by researchers when iterating on versions, to make sure the system was going in the right direction. This could lead to a very blurry line where they could do things that aren’t [literal LLM training] but basically achieve a similar outcome.
That’s highly relevant, thanks!
AI for Resolving Forecasting Questions: An Early Exploration
It’s possible that from the authors perspective, the specific semantic meanings I took from terms like “automated alignment research” and “fleets” wasn’t implied. But if I made the mistake, I’m sure other readers will as well, so I’d like to encourage changes here before these phrases take off much further (if others agree with my take.)
I’m happy this area is getting more attention.
I feel nervous about the terminology. I think that terminology can presuppose some specific assumptions about how this should or will play out, that I don’t think are likely.
“automating alignment research” → I know this has been used before, it sounds very high-level to me. Like saying that all software used as part of financial trading workflows is “automating financial trading.” I think it’s much easier to say that software is augmenting financial trading or similar. There’s not one homogeneous thing called “financial trading,” the term typically emphasises the parts that aren’t yet automated. The specific ways it’s integrated sometimes involve it replacing entire people, sometimes involve it helping people, and often does both in complex ways.
“Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists.”
In software, the word fleet sometimes refers to specific deployment strategies. A whole lot of the automation doesn’t look like “bots”—rather it’s a lot of regular tools, plug-ins, helpers, etc.
”vast digital fleets of specialized AI agents working in concert”
This is one architecture we can choose, but I’m not sure how critical/significant it will be. I very much agree that AI will be a big deal, but this makes it sound like you’re assuming a specific way for AI to be used.
All that said, I’m very much in favor of us taking a lot of advantage of AI systems for all the things we want in the world, including AI safety. I imagine that for AI safety, we’ll probably use a very eccentric and complex mix of AI technologies. Some with directly replace some existing researchers, we’ll have specific scripts for research experiments, maybe agent-like things that do ongoing oversight, etc.
This came from a Facebook thread where I argued that many of the main ways AI was described as failing fall into few categories (John disagreed).
I appreciated this list, but they strike me as fitting into a few clusters.
...I would flag that much of that is unsurprising to me, and I think categorization can be pretty fine.
In order:
1) If an agent is unwittingly deceptive in ways that are clearly catastrophic, and that could be understood by a regular person, I’d probably put that under the “naive” or “idiot savant” category. As in, it has severe gaps in its abilities that a human or reasonable agent wouldn’t. If the issue is that all reasonable agents wouldn’t catch the downsides of a certain plan, I’d probably put that under the “we made a pretty good bet given the intelligence that we had” category.
2) I think that “What Failure Looks Like” is less Accident risk, more “Systemic” risk. I’m also just really unsure what to think about this story. It feels to me like it’s a situation where actors are just not able to regulate externalities or similar.
3) The “fusion power generator scenario” seems like just a bad analyst to me. A lot of the job of an analyst is to flag important considerations. This seems like a pretty basic ask. For this itself to be the catastrophic part, I think we’d have to be seriously bad at this. (“i.e. Idiot Savant”)
4) STEM-AGI → I’d also put this in the naive or “idiot savant” category.
5) “that plan totally fails to align more-powerful next-gen AGI at all” → This seems orthogonal to “categorizing the types of unalignment”. This describes how incentives would create an unaligned agent, not what the specific alignment problem is. I do think it would be good to have better terminology here, but would probably consider it a bit adjacent to the specific topic of “AI alignment”—more like “AI alignment strategy/policy” or something.
6) “AGIs act much like a colonizing civilization” → This sounds like either unalignment has already happened, or humans just gave AIs their own power+rights for some reason. I agree that’s bad, but it seems like a different issue than what I think of as the alignment problem. More like, “Yea, if unaligned AIs have a lot of power and agency and different goals, that would be suboptimal”
7) “but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight.” → This sounds like a traditional mesa-agent failure. I expect a lot of “alignment” with a system made of a bunch of subcomponents is “making sure no subcomponents do anything terrible.” Also, still leaves open the specific way this subsystem becomes/is unaligned.
8 ) “using an LLM to simulate a whole society. ” → Sorry, I don’t quite follow this one.
Personally, I like the focus “scheming” has. At the same time, I imagine there are another 5 to 20 clean concerns we should also focus on (some of which have been getting attention).
While I realize there’s a lot we can’t predict, I think we could do a much better just making lists of different risk factors and allocating research amongst them.
Thanks!
> Are you thinking here of the new-ish canvases built into the chat interfaces of some of the major LLMs (Claude, ChatGPT)? Or are there tools specifically optimized for this that you think are good? Thanks!
I’m primarily thinking of the Python canvas offered by ChatGPT. I don’t have other tools in mind.
Fair point! I should have more prominently linked to that.
There’s some previous posts about it on LessWrong and the EA Forum explaining it in more detail.
https://www.lesswrong.com/tag/squiggle
https://forum.effectivealtruism.org/topics/squiggle
Yep, this is definitely one of the top things we’re considering for the future. (Not sure about Perplexity specifically, but some related API system).
I think there are a bunch of interesting additional steps to add, it’s just a bit of a question of developer time. If there’s demand for improvements, I’d be excited to make them.
Okay, I ran this a few times.
I tried two variations. In the first, I just copied and pasted the text you provided. In the second, i first asked Perplexity to find relevant information about your prompt—then I pasted this information into Squiggle AI, with the prompt.
Here are the outputs without the Perplexity data:
https://squigglehub.org/models/ai-generated-examples/eu-bottle-directive-172
https://squigglehub.org/models/ai-generated-examples/eu-bottle-directive-166
And with that data:
https://squigglehub.org/models/ai-generated-examples/eu-bottle-directive-with-data-113
https://squigglehub.org/models/ai-generated-examples/eu-bottle-directive-with-data-120
Of the ones with data, they agree that the costs are roughly €5.1B, the QALYs gained are around 30 to 3000, leading to a net loss of around €−5.1B. (They both estimate figures of around €50k to €150k of willingness to pay for a QALY, in which case 30 to 3000 QALYs is very little compared to the cost).
My personal hunch is that this is reasonable as a first pass. That said, I think the estimate of the QALYs gained seems the most suspect to me. These models estimated this directly—they didn’t make sub-models of this, and their estimates seem wildly overconfident if this is meant to include sea life.
I think it could make sense to make further models delving more into this specific parameter.
Yea, I think there’s generally a lot of room for experimentation around here.
I very much hope that in the future, AI cools could be much more compositional, so you won’t need to work only in one ecosystem to get a lot of the benefits.
In that world, it’s also quite possible that Claude could call Squiggle AI for modeling, when is needed.
A different option I see is that we have slow tools like Squiggle AI that make large models that are expected to be useful for people later on. The results of these models, when interesting, will be cached and made publicly available on the web, for tools like Claude.In general I think we want a world where the user doesn’t have to think about or know which tools are best in which situations. Instead that all happens under the hood.
Thanks for the info!
Yea, I think it’s a challenge to beat Claude/ChatGPT at many things. Lots of startups are trying now, and they are having wildly varying levels of success.
I think that Squiggle AI is really meant for some fairly specific use cases. Basically, if you want to output a kind of cost-effectiveness model that works well in Squiggle notebooks, and you’re focused on estimating things without too much functional complexity, it can be a good fit.
Custom charts can get messy. The Squiggle Components library comes with a few charts that we’ve spent time optimizing, but these are pretty specific. If you want fancy custom diagrams, you probably want to use JS directly, in which case environments like Claude’s make more sense.
If you’d prefer, feel free to leave questions for Squiggle AI here, and I’ll run the app on them and respond with the results.
Introducing Squiggle AI
Separately, I’d flag that I’m not a huge fan of these bold name like “The Compendium.” I get that the team might think the document justified the grandiosity, but from my perspective, I’m more skeptical. (I feel similarly with other names like “Situational Awareness.”
In general, I’m happy to see people work to be more explicit about their viewpoints.
It seems like the team behind this disagrees with much of the rest of the AI safety efforts, in that you think other safety approaches and strategies are unlikely to succeed. Most of this part seems to be in the AI Safety section. Arguably this section provides a basic summary for those not very familiar with the area, but for those who disagree with you on these pieces, I suspect that this section isn’t close to long or detailed enough to convince them.
I find that this section is very hand-wavy and metaphorical. I currently believe that AI oversight mechanisms, control, and careful scaling have a decent chance of maintaining reasonable alignment, if handled decently intelligently.
For example, this piece says,
> The appropriate analogy is not one researcher reviewing another, but rather a group of preschoolers reviewing the work of a million Einsteins. It might be easier and faster than doing the research itself, but it will still take years and years of effort and verification to check any single breakthrough.
I think it’s likely we won’t get a discrete step like that. It would be more like some smart scientists reviewing the work of some smarter scientists, but in a situation where the latter team is forced to reveal all of their in-progress thinking, and is evaluated in a very large range of extreme situations, and there are other clever strategies for inventive setting and oversight.There also seems to be an implicit assumption that scaling will happen incredibly quickly after rough benchmarks of AGI are achieved, like a major discontinuous jump. I think this is possible, but I’m very unsure.
Welp. I guess yesterday proved this part to be almost embarrassingly incorrect.