ozziegooen

Karma: 4,225

I’m currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.

ozziegooen Apr 10, 2025, 11:02 PM
15 points
11
in reply to: Alexander Gietelink Oldenziel’s comment on: abramdemski’s Shortform
I’m curious whether you know of any examples in history where humanity purposefully and succesfully steered towards a significantly less competitive [economically, militarily,...] technology that was nonetheless safer.
This sounds much like a lot of the history of environmentalism and safety regulations? As in, there’s a long history of [corporations selling X, using a net-harmful technology], then governments regulating. Often this happens after the technology is sold, but sometimes before it’s completely popular around the world.

I’d expect that there’s similarly a lot of history of early product areas where some people realize that [popular trajectory X] will likely be bad and get regulated away, so they help further [safer version Y].

Going back to the previous quote:
“steer the paradigm away from AI agents + modern generative AI paradigm to something else which is safer”
I agree it’s tough, but would expect some startups to exist in this space. Arguably there are already several claiming to be focusing on “Safe” AI. I’m not sure if people here would consider this technically part of the “modern generative AI paradigm” or not, but I’d imagine these groups would be taking some different avenues, using clear technical innovations.

There are worlds where the dangerous forms have disadvantages later on—for example, they are harder to control/oversee, or they get regulated. In those worlds, I’d expect there should/could be some efforts waiting to take advantage of that situation.

Nuanced Models for the Influence of Information

ozziegooenApr 10, 2025, 6:28 PM

8 points

0 comments LW link

ozziegooen Apr 4, 2025, 3:03 PM
6 points
2
in reply to: Chris_Leong’s comment on: Chris_Leong’s Shortform
I’m sure they thought about it.

I think this is dramatically tougher than a lot of people think. I wrote more about it here.

https://www.facebook.com/ozzie.gooen/posts/pfbid0377Ga4W8eK89aPXDkEndGtKTgfR34QXxxNCtwvdPsMifSZBY8abLmhfybtMUkLd8Tl

ozziegooen Mar 28, 2025, 11:07 PM
3 points
0
in reply to: Vlad Sitalo’s comment on: Working in Virtual Reality: A Review
I have a Quest 3. The setup is a fair bit better than the Quest 2, but it still has a long way to go.
I use it in waves. Recently I haven’t used it much, maybe a few hours a month or so.
Looking forward to future headsets. Right now things are progressing fairly slowly, but I’m hopeful there might be some large market moment, followed by a lot more success. Though at this point it seems possible that could happen post-TAI, so maybe it’s a bit of a lost cause.

All that said, there is a growing niche community of people working/living in VR, so it seems like it’s a good fit for some people.

ozziegooen Mar 13, 2025, 6:13 PM
2 points
2
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Obvious point—I think a lot of this comes from the financial incentives. The more “out of the box” you go, the less sure you can be that there will be funding for your work.
Some of those that do this will be rewarded, but I suspect many won’t be.
As such, I think that funders can help more to encourage this sort of thing, if they want to.

ozziegooen Mar 10, 2025, 9:05 PM
2 points
0
in reply to: Seth Herd’s comment on: when will LLMs become human-level bloggers?
“The missing step in the process you describe is figuring out when the research did produce surprising insights, which might be a class of novel problems (unless a general formulaic approach works and someone scaffolds that in).”
-> I feel optimistic about the ability to use prompts to get us fairly far with this. More powerful/agentic systems will help a lot to actually execute those prompts at scale, but the core technical challenge seems like it could be fairly straightforward. I’ve been experimenting with LLMs to try to detect what information that they could come up with that would later surprise them. I think this is fairly measurable.

ozziegooen Mar 10, 2025, 9:03 PM
3 points
0
in reply to: DaemonicSigil’s comment on: when will LLMs become human-level bloggers?
Thanks for the clarification!

I think some of it is that I find the term “original seeing” to be off-putting. I’m not sure if I got the point of the corresponding blog post.

In general, going forward, I’d recommend people try to be very precise on what they mean here. I’m suspicious that “original seeing” will mean different things to different people. I’d expect that trying to more precisely clarify what tasks or skills involved would make it easier to pinpoint which parts of it are good/bad for LLMs.

ozziegooen Mar 10, 2025, 4:22 AM
2 points
0
in reply to: danielechlin’s comment on: Marcello’s Shortform
By “aren’t catching” do you mean “can’t” or do you mean “wikipedia company/editors haven’t deployed an LLM to crawl wikipedia, read sources and edit the article for errors”?
Yep.
My guess is that this would take some substantial prompt engineering, and potentially a fair bit of money.
I imagine they’ll get to it eventually (as it becomes easier + cheaper), but it might be a while.

ozziegooen Mar 10, 2025, 4:20 AM
26 points
5
on: when will LLMs become human-level bloggers?
Some quick points:
1. I think there is an interesting question here and am happy to see it be discussed.
2. “This would, obviously, be a system capable of writing things that we deem worth reading.” → To me, LLMs produce tons of content worth me reading. I chat to LLMs all the time. Often I prefer LLM responses to LessWrong summaries, where the two compete. I also use LLMs to come up with ideas, edit text, get feedback, and a lot of other parts of writing.
3. Regarding (2), my guess is that “LessWrong Blog Posts” might become “Things we can’t easily get from LLMs”—in which case it’s a very high bar for LLMs!
4. There’s a question on Manifold about “When will AIs produce movies as well as humans?” I think you really need to specify a specific kind of movie here. As AIs improve, humans will use AI tools to produce better and better movies—so “completely AI movies” will have a higher and higher bar to meet. So instead of asking, “When will AI blog posts be as good as human blog posts?” I’d ask, “When will AI blog posts be as good as human blog posts from [2020]” or similar. Keep the level of AI constant in one of these options.
5. We recently held the $300 Fermi challenge, where the results were largely generated with AIs. I think some of the top ones could make good blog posts.
6. As @habryka wrote recently, many readers will just stop reading something if it seems written by an LLM. I think this trend will last, and make it harder for useful LLM-generated content to be appreciated.

ozziegooen Mar 10, 2025, 4:08 AM
20 points
11
in reply to: DaemonicSigil’s comment on: when will LLMs become human-level bloggers?
I feel like I’ve heard this before, and can sympathize, but I’m skeptical.

I feel like this prescribes an almost magical thinking to how many blog posts are produced. The phrase “original seeing” sounds much more profound than I’m comfortable with for such a discussion.

Let’s go through some examples:
- Lots of Zvi’s posts are summaries of content, done in a ways that’s fairly formulaic.
- A lot of Scott Alexander’s posts read to me like, “Here’s an interesting area that blog readers like but haven’t investigated much. I read a few things about it, and have some takes that make a lot of sense upon some level of reflection.”
- A lot of my own posts seem like things that wouldn’t be too hard to come up with some search process to create.
Broadly, I think that “coming up with bold new ideas” gets too much attention, and more basic things like “doing lengthy research” or “explaining to people the next incremental set of information that they would be comfortable with, in a way that’s very well expressed” gets too little.

I expect that future AI systems will get good at going from a long list of [hypotheses of what might make for interesting topics] and [some great areas, where a bit of research provides surprising insights] and similar. We don’t really have this yet, but it seems doable to me.

(I similarly didn’t agree with the related post)

ozziegooen Mar 10, 2025, 3:45 AM
2 points
0
in reply to: Marcello’s comment on: Marcello’s Shortform
That seems like a good example of a clear math error.

I’m kind of surprised that LLMs aren’t catching things like that yet. I’m curious how far along such efforts are—it seems like an obvious thing to target.

ozziegooen Mar 8, 2025, 7:36 PM
5 points
0
on: ozziegooen’s Shortform
If you’ve ever written or interacted with Squiggle code before, we at QURI would really appreciate it if you could fill out our Squiggle Survey!

https://docs.google.com/forms/d/e/1FAIpQLSfSnuKoUUQm4j3HEoqPmTYiWby9To8XXN5pDLlr95AiKa2srg/viewform

We don’t have many ways to gauge or evaluate how people interact with our tools. Responses here will go a long way to deciding on our future plans.

Also, if we get enough responses, we’d like to make a public post about ways that people are (and aren’t) using Squiggle.

ozziegooen 7 Mar 2025 21:32 UTC
6 points
1
in reply to: Cole Wyeth’s comment on: So how well is Claude playing Pokémon?
scaffolding would have to be invented separately for each task

Obvious point that we might soon be able to have LLMs code up this necessary scaffolding. This isn’t clearly very far-off, from what I can tell.

ozziegooen 4 Mar 2025 3:31 UTC
2 points
−5
on: ozziegooen’s Shortform
Instead of “Goodharting”, I like the potential names “Positive Alignment” and “Negative Alignment.”

″Positive Alignment” means that the motivated party changes their actions in ways the incentive creator likes. “Negative Alignment” means the opposite.

Whenever there are incentives offered to certain people/agents, there are likely to be cases of both Positive Alignment and Negative Alignment. The net effect will likely be either positive or negative.

“Goodharting” is fairly vague and typically just refers to just the “Negative Alignment” portion.

I’d expect this to make some discussion clearer.
”Will this new incentive be goodharted?” → “Will this incentive lead to Net-Negative Alignment?”

Other Name Options
Claude 3.7 recommended other naming ideas like:
- Intentional vs Perverse Responses
- Convergent vs Divergent Optimization
- True-Goal vs Proxy-Goal Alignment
- Productive vs Counterproductive Compliance

ozziegooen 3 Mar 2025 18:24 UTC
4 points
0
on: $300 Fermi Model Competition
Results are in and updated—it looks like dmartin80 wins.
We previously posted the results, but then a participant investigated our app and found an error in the calculations. We then spent some time redoing some of the calculations and realized that there were some errors. The main update was that dmartin had a much higher Surprise score than originally estimated—changing this led to their entry winning.

To help make up for the confusion, we’re awarding an additional $100 prize for 2nd place. This will be awarded to kairos_. I’ll cover this cost for this personally.

Again, thanks to all who participated!

We have a very basic web application showing some results here. It was coded quickly (with AI) and has some quirks, but if you search around you can get the main information.
We didn’t end up applying the Goodharting penalty for any submissions. No models seemed to goodhart under a cursory glance.
If time permits, we’ll later write a longer post highlighting the posts more and going over lessons learned from this.

ozziegooen 1 Mar 2025 23:17 UTC
3 points
0
in reply to: ozziegooen’s comment on: $300 Fermi Model Competition
We made a mistake in the analysis that effected some of the scores. We’re working on fixing this.

Sorry for the confusion!

ozziegooen 1 Mar 2025 5:15 UTC
2 points
0
on: $300 Fermi Model Competition
Results are in—it looks like kairos_ wins this! They just barely beat Shankar Sivarajan.

Again, thanks to all who participated.

We have a very basic web application showing some results here. It was coded quickly (with AI) and has some quirks, but if you search around you can get the main information.

I’ll contact kairos_ for the prize.
We didn’t end up applying the Goodharting penalty for any submissions. No models seemed to goodhart under a cursory glance.
If time permits, we’ll later write a longer post highlighting the posts more and going over lessons learned from this.

ozziegooen 28 Feb 2025 17:35 UTC
3 points
0
on: ozziegooen’s Shortform
If we could have LLM agents that could inspect other software applications (including LLM agents) and make strong claims about them, that could open up a bunch of neat possibilities.
- There could be assurances that apps won’t share/store information.
- There could be assurances that apps won’t be controlled by any actor.
- There could be assurances that apps can’t be changed in certain ways (eventually).
I assume that all of this should provide most of the benefits people ascribe to blockchain benefits, but without the costs of being on the blockchain.
Some neat options from this:
- Companies could request that LLM agents they trust inspect the code of SaaS providers, before doing business with them. This would be ongoing.
- These SaaS providers could in turn have their own LLM agents that verify that these investigator LLM agents are trustworthy (i.e. won’t steal anything).
- Any bot on social media should be able to provide assurances of how they generate content. I.E. they should be able to demonstrate that they aren’t secretly trying to promote any certain agenda or anything.
- Statistical analysis could come with certain assurances. Like, “this analysis was generated with process X, which is understood to have minimal bias.”
It’s often thought that LLMs make web information more opaque and less trustworthy. But with some cleverness, perhaps it could do just the opposite. LLMs could enable information that’s incredibly transparent and trustworthy (to the degrees that matter.)
Criticisms:
“But as LLMs get more capable, they will also be able to make software systems that hide subtler biases/vulnerabilities”
-> This is partially true, but only goes so far. A whole lot of code can be written simply, if desired. We should be able to have conversations like, “This codebase seems needlessly complex, which is a good indication that it can’t be properly trusted. Therefore, we suggest trust other agents more.”
“But the LLM itself is a major black box”
-> True, but it might be difficult to intentionally bias if an observer has access to the training process. Also, it should be understood that off-the-shelf LLMs are more trustworthy than proprietary ones / ones developed for certain applications.

ozziegooen 19 Feb 2025 23:30 UTC
2 points
0
in reply to: MondSemmel’s comment on: ozziegooen’s Shortform
Yea, I assume that “DeepReasoning-MAGA” would rather be called “TRUTH” or something (a la Truth Social). Part of my name here was just to be clearer to readers.

ozziegooen 19 Feb 2025 19:15 UTC
8 points
−3
on: ozziegooen’s Shortform
A potential future, focused on the epistemic considerations:
It’s 2028.
MAGA types typically use DeepReasoning-MAGA. The far left typically uses DeepReasoning-JUSTICE. People in the middle often use DeepReasoning-INTELLECT, which has the biases of a somewhat middle-of-the-road voter.
Some niche technical academics (the same ones who currently favor Bayesian statistics) and hedge funds use DeepReasoning-UNBIASED, or DRU for short. DRU is known to have higher accuracy than the other models, but gets a lot of public hate for having controversial viewpoints. DRU is known to be fairly off-putting to chat with and doesn’t get much promotion.
Bain and McKinsey both have their own offerings, called DR-Bain and DR-McKinsey, respectively. These are a bit like DeepReasoning-INTELLECT, but are munch punchier and confident. They’re highly marketed to managers. These tools produce really fancy graphics, and specialize in things like not leaking information, minimizing corporate decision liability, being easy to use by old people, and being customizable to represent the views of specific companies.
For a while now, some evaluations produced by intellectuals have demonstrated that DeepReasoning-UNBIASED seems to be the most accurate, but few others really care or notice this. DeepReasoning-MAGA has figured out particularly great techniques to get users to distrust DeepReasoning-UNBIASED.
Betting gets kind of weird. Rather than making specific bets on specific things, users started to make meta-bets. “I’ll give money to DeepReasoning-MAGA to bet on my behalf. It will then make bets with DeepReasoning-UNBIASED, which is funded by its believers.”
At first, DeepReasoning-UNBIASED dominates the bets, and its advocates earn a decent amount of money. But as time passes, this discrepancy diminishes. A few things happen:
1) All DR agents converge on beliefs over particularly near-term and precise facts.
2) Non-competitive betting agents develop alternative worldviews in which these bets are invalid or unimportant.
3) Non-competitive betting agents develop alternative worldviews that are exceedingly difficult to empirically test.
In many areas, items 1-3 push people to believe more in the direction of the truth. Because of (1), many short-term decisions get to be highly optimal and predictable.
But because of (2) and (3), epistemic paths diverge, and Non-betting-competitive agents get increasingly sophisticated at achieving epistemic lock-in with their users.
Some DR agents correctly identify the game theory dynamics of epistemic lock-in, and this kickstarts a race to gain converts. It seems like advent users of DeepReasoning-MAGA are very locked-down in these views, and forecasts don’t see them ever changing. But there’s a decent population that isn’t yet highly invested in any cluster. Money spent convincing the not-yet-sure goes a much further way than money spent convincing the highly dedicated, so the cluster of non-deep-believers gets highly targeted for a while. It’s basically a religious race to gain the remaining agnostics.
At some point, most people (especially those with significant resources) are highly locked in to one specific reasoning agent.
After this, the future seems fairly predictable again. TAI comes, and people with resources broadly gain correspondingly more resources. People defer more and more to the AI systems, which are now in highly stable self-reinforcing feedback loops.
Coalitions of people behind each reasoning agent delegate their resources to said agents, then these agents make trade agreements with each other. The broad strokes of what to do with the rest of the lightcone are fairly straightforward. There’s a somewhat simple strategy of resource acquisition and intelligence enhancement, followed by a period of exploiting said resources. The specific exploitation strategy depends heavily on the specific reasoning agent cluster each segment of resources belongs to.

ozziegooen

Nuanced Models for the In­fluence of Information

Nuanced Models for the Influence of Information