Jonathan Claybrough

Karma: 520

Software engineer transitioned into AI safety, teaching and strategy. Particularly interested in psychology, game theory, system design, economics.

Jonathan Claybrough Apr 18, 2025, 5:07 AM
1 point
0
in reply to: Adam Karvonen’s comment on: Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Thanks for the followup!

Jonathan Claybrough Apr 16, 2025, 6:31 PM
4 points
0
on: Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
o3 released today and I was impressed by the demo for working with images they gave so I wanted to try this out (using the prompt op linked to in appendices), but I don’t have the machining experience to evaluate the answer quickly, so I thought I’d prompt it and share its result if anyone else wants to evaluate it : https://chatgpt.com/share/67fff70d-7d90-800e-a913-663b82ae7f33

Jonathan Claybrough Mar 14, 2025, 1:21 AM
6 points
0
on: Announcing Dialogues
PSA—at least as of March 2024, the way to create a Dialogue is by navigating to someone else’s profile and to click the “Dialogue” option appearing near the right, next to the option to message someone.

Jonathan Claybrough Mar 7, 2025, 6:00 PM
11 points
8
on: Jonathan Claybrough’s Shortform
I get the impression from talking to people who work professionally on reducing ai x-risk (largely taken, would include doing ops for FLI) that less than half of them could give an accurate and logically sound model of where ai x-risk comes from and why it is good that they do the work they do (bonus points if they can compare it to other work they might be doing instead). I’m generally a bit disappointed by this because to me it doesn’t seem that hard to get everyone who’s a professional knowledgeable on the basics, and it seems worthwhile as more people could be autonomous in assessing strategic decisions and making sure implementation of plans serves the right aims.

An example of why having a model of x-risk and directions for reducing it matters : Imagine you’re an ops person with money to organize a half-day conference. What do?
What city to choose, which participants to invite?

If you have no model of anything, I guess you can copycat—do a small scale EAG-like event for AI Safety people, or do a scientific conference like thing. That’s okay, probably not a total waste of money (except to the extent you don’t know ai safety theory, it’s possible to do counter-productive actions like giving a platform to the wrong people).

Imagine you have the following simple model :
- AI is being improved by people
- If deployed with the current alignment tech and governance level, powerful autonomous AGI could have convergent instrumental goals that lead to takeover (sharp left turn models) or large scale deployment of AGI in society can eek out human influence (Out with a Whimper or Multipolar Failure)
- Thus we should do actions that get us the best alignment tech and governance level by the time anyone is able to train and deploy dangerous IA systems.
- This means we could do actions that
- - Slow down dangerous AI development
- - Accelerate alignment tech development
- - Accelerate getting a good level of governance

Each of these need sub-models (all interlinked), let’s detail a simple one for governance
Things that accelerate getting a good level of governance
- Better knowledge and models of ai x-risk (scenario+risk planning), demonstrations of precursors (model organisms of misalignment, ..), increasing scientific consensus about the paths to avoid
- Spreading the above knowledge to the general public (to have politicians be supported in economy-limiting policy)
- Spreading the above knowledge to policy-makers
- Having technically competent people write increasingly better policy proposals (probably stuff leading to international regulation of dangerous AI)

Now that you have these, it’s much easier to find a few key metrics to optimize for your ops event. Doing it really well might include you talking to other event organizers to know what has and hasn’t been done, what works, what’s been neglected etc, but even without all that you can decide to act on:
Improvement in attendees’ knowledge and models of AI x-risk
Pre-event and post-event surveys or informal conversations to gauge attendees’ understanding of AI risk scenarios and mitigation strategies.
Structure the event with interactive sessions explicitly designed to clarify AI x-risk models, scenario planning, and concrete policy implications.
Potential reach for disseminating accurate AI safety knowledge
Choose guests who have strong networks or influence with policy-makers, academics, or public media.
Select a location close to influential governmental bodies or major media centers (e.g., Washington D.C., Brussels, or Geneva).

(I realize now that I wrote a full example that this might have been a mini post to serve as short reference, a wireframe for thinking through this, please reply if you think a cleaned up version of this would be good)

To get back to my original point, it’s currently my impression that many who work in AI x-risk reduction, for example in ops, could not produce a “good” version of this above draft even with 2 hours of their time because of lacking background knowledge. I hope the above example sufficiently illustrates that they could be doing better work if they did.

Jonathan Claybrough Feb 2, 2025, 8:16 PM
4 points
0
in reply to: Jan_Kulveit’s comment on: Catastrophe through Chaos
(off topic to op, but in topic to Jan bringing up ALERT)
To what extent do you believe Sentinel fulfills what you wanted to do with ALERT? Their emergency response team is pretty small rn. Would you recommend funders support that project or a new ALERT?

Jonathan Claybrough Jan 19, 2025, 10:18 AM
4 points
4
on: Beards and Masks?
Appreciate the photos and final video, as they also make this informative post more enjoyable to follw through.

Jonathan Claybrough Jan 19, 2025, 10:10 AM
109 points
44
on: Jonathan Claybrough’s Shortform
EpochAI seem do be doing a lot of work that’ll accelerate AI capabalities research and development (eg. informing investors and policy makers that yes AI is a huge economic deal and here are the bottlenecks you should work around, building capabilities benchmarks to optimize for). Under common-around-LW assumptions that no one could align AGI at this point, they are, by these means, increasing AI catastrophic and existential risk.
At a glance they also seem to not be doing AI x-risk reducing moves, like using their platform to mention that there are risks associated to AI, that these are not improbable, and that these require both technical work and governance to manage appropriately. This was salient to me in their latest podcast episode—speaking at length about AI replacing human workers in 5 to 10 years and the impact on the economy, without even hinting that there are risks associated with this, is burying the lede.
Given that Epoch AI is funded by OpenPhilantropy and Jaan Tallinn, who on their face care about reducing AI x-risk, what am I missing? (non rhetorical)
- What is EpochAi’s theory of change for making the world better on net?
- Overall, is EpochAI increasing or reducing ai x-risk (on LW models, in their models)?
I wanted this to be but a short shortpost, but since I’m questioning a pretty big maybe influential org under my true name let me show good faith with proposing reasons that might contribute to what I’m seeing. For anyone unaware of their work, maybe check out their launch post, or their recent podcast episode.
- OpenPhil is unsure about magnitude of AI x-risk so invest in forecasting AI capabilities to know if they should invest more in AI safety.
- EpochAI doesn’t believe AI x-risk is likely and believes that accelerating is overall better for humanity (seems true for some employees but not all)
- EpochAI believes that promoting the information that AGI is economically important and possible soon is better because governments will better govern it than counterfactually
- EpochAI is saying what they think is true without selection to avoid being deceptive (this doesn’t mesh with the next reason)
- EpochAI believe that mentionning AI risks at this stage would hurt their platform and their influence, and are waiting for a more ripe opportunity/better argued paper.
Tagging a few EpochAI folk that appeared in their podcast - @Jsevillamol @Tamay @Ege Erdil

Jonathan Claybrough Nov 28, 2024, 9:06 PM
3 points
0
on: ARENA 4.0 Impact Report
Congratz on your successes and thank you for publishing this impact report.

It leaves me unsatiated related to cost effectiveness though. With no idea of how much money was invested in this project to get this outcome, I don’t know if Arena is cost effective compared to other training programs and counterfactual opportunities. Would you mind sharing at least something about the amount of funding this got?

Re
Still, it is also positive if ARENA can help participants who want to pursue a career transition test their fit for alignment engineering in a comparatively low-cost way.
it doesn’t strike me that a 5 week all expenses paid program is a particularly low cost way to find out AI Safety isn’t for you (as compared to for example participating in an Apart Hackathon)

Jonathan Claybrough Nov 5, 2024, 7:46 PM
1 point
0
in reply to: Alfred Harwood’s comment on: Abstractions are not Natural
I don’t actualy think your post was hostile, but I think I get where deepthoughtlife is coming from. At the least, I can share about how I felt reading this post and point out to why, since you seem keen on avoiding the negative side. Btw I don’t think you avoid causing any frustration in readers, they are too diverse, so don’t worry too much about it either.

The title of the piece is strongly worded and there’s no epistimic status disclaimer to state this is exploratory, so I actually came in expecting much stronger arguments. Your post is good as an exposition of your thoughts and conversation started, but it’s not a good counter argument to NAH imo, so shouldn’t be worded as such. Like deepthoughtlife, I feel your post is confused re NAH, which is totally fine when stated as such, but a bit grating when I came in expecting more rigor or knowledge of NAH.

Here’s a reaction to the first part :
- in “Systems must have similar observational apparatus” you argue that different apparatus lead to different abstractions and claim a blind deaf person is such an example, yet in practice blind deaf people can manipulate all the abstractions others can (with perhaps a different inner representation), that’s what general intelligence is about. You can check out this wiki page and video for some of how it’s done https://en.wikipedia.org/wiki/Tadoma . The point is that all the abstractions can be understood and must be understood by a general intelligence trying to act effectively, and in practice Helen Keler could learn to speak by using other senses than hearing, in the same way we learn all of physics despite limited native instruments.

I think I had similar reactions to other parts, feeling they were missing the point about NAH and some background assumptions.

Thanks for posting!

Jonathan Claybrough Nov 5, 2024, 4:54 PM
1 point
0
in reply to: Garrett Baker’s comment on: dirk’s Shortform
Putting this short rant here for no particularly good reason but I dislike that people claim constraints here or there in a way where I guess their intended meaning is only that “the derivative with respect to that input is higher than for the other inputs”.

On factory floors there exist hard constraints, the throughput is limited by the slowest machine (when everything has to go through this). The AI Safety world is obviously not like that. Increase funding and more work gets done, increase talent and more work gets done. None are hard constraints.

If I’m right that people are really only claiming the weak version, then I’d like to see somewhat more backing to their claims, especially if you say “definitely”. Since none are constraints, the derivatives could plausibly be really close to one another. In fact, they kind of have to be, because there are smart optimizers who are deciding where to spend their funding and trying to actively manage the proportion of money sent to field building (getting more talent) vs direct work.

Jonathan Claybrough Nov 5, 2024, 4:45 PM
4 points
0
in reply to: CstineSublime’s comment on: CstineSublime’s Shortform
Interesting thoughts, ty.

A difficulty to common understanding I see here is that you’re talking of “good” or “bad” paragraphs in the absolute, but didn’t particularly define “good” or “bad” paragraph by some objective standard, so you’re relying on your own understanding of what’s good or bad. If you were defining good or bad relatively, you’d look for a 100 paragraphs, and post the worse 10 as bad. I’d be interested in seeing what were the worse paragraphs you found, some 50 percentile ones, and what were the best, then I’d tell you if I have the same absolute standards as you have.

Jonathan Claybrough Nov 5, 2024, 4:35 PM
1 point
1
on: The Shallow Bench
Enjoyed this post.

Fyi, from the front page I just hovered this post “The shallow bench” and was immediately spoiled on Project Hail Mary (which I had started listening to, but didn’t get far into). Maybe add some spoiler tag or warning directly after the title?

Jonathan Claybrough Nov 4, 2024, 5:23 PM
2 points
0
in reply to: Alex Vermillion’s comment on: JargonBot Beta Test
Without removing from the importance of getting the default right, and with some deliberate daring to feature creep, I think adding a customization feature (select colour) in personal profiles is relatively low effort and maintenance, so would solve the accessibility problem.

Jonathan Claybrough Oct 8, 2024, 10:39 AM
6 points
0
on: Jonathan Claybrough’s Shortform
There’s tacit knowledge in bay rationalist conversation norms that I’m discovering and thinking about, here’s an observation and related thought. (I put the example later after the generalisation because that’s my preferred style, feel free to read the other way).

Willingness to argue righteously and hash out things to the end, repeated over many conversations, makes it more salient when you’re going for a dead end argument. This salience can inspire you to do argue more concisely and to the point over time.
Going to the end of things generates ground data on which to update your models of arguing and conversation paths, instead of leaving things unanswered.
So, though it’s skilful to know when not to “waste” time on details and unimportant disagreements, the norm of “frequently enough going through til everyone agrees on things” seems profoundly virtuous.

Short example from today, I say “good morning”. They point out it’s not morning (it’s 12:02). I comment about how 2 minutes is not that much. They argue that 2 minutes is definitely more than zero and that’s the important cut-off.
I realize that “2 minutes is not that much” was not my true rebuttal, that this next token my brain generated was mostly defensive reasoning rather than curious exploration of why they disagreed with my statement. Next time I could instead note they’re using “morning” to have a different definition/central cluster than I, appreciate that they pointed this out, and decide if I want to explore this discrepancy or not.

Many things don’t make sense if you’re just doing them for local effect, but do when you consider long term gains. (something something naive consequentialism vs virtue ethics flavored stuff)

Jonathan Claybrough Oct 4, 2024, 7:39 AM
1 point
0
in reply to: Davidmanheim’s comment on: Dialogue introduction to Singular Learning Theory
I don’t strongly disagree but do weakly disagree on some points so I guess I’ll answer

Re first- if you buy into automated alignment work by human level AGI, then trying to align ASI now seems less worth it. The strongest counterargument to this I see is that “human level AGI” is impossible to get with our current understanding, as it will be superhuman in some things and weirdly bad at others.

Re second- disagreements might be nitpicking on “few other approaches” vs “few currently pursued approaches”. There are probably a bunch of things that would allow fundamental understanding if they panned out (various agent foundations agendas, probably safe ai agendas like davidad’s), though one can argue they won’t apply to deep learning or are less promising to explore than SLT

Jonathan Claybrough Aug 4, 2024, 10:42 PM
3 points
1
on: We’re not as 3-Dimensional as We Think
I don’t think your second footnote sufficiently addresses the large variance in 3D visualization abilities (note that I do say visualization, which includes seeing 2D video in your mind of a 3D object and manipulating that smoothly), and overall I’m not sure where you’re getting at if you don’t ground your post in specific predictions about what you expect people can and cannot do thanks to their ability to visualize 3D.
You might be ~conceptually right that our eyes see “2D” and add depth, but *um ackshually*, two eyes each receiving 2D data means you’ve received 4D input (using ML standards, you’ve got 4 input dimensions per time unit, 5 overall in your tensor). It’s very redundant, and that redundancy mostly allows you to extract depth using a local algo, which allows you to create a 3D map in your mental representation. I don’t get why you claim we don’t have a 3D map at the end.
Back to concrete predictions, are there things you expect a strong human visualizer couldn’t do? To give intuition I’d say a strong visualizer has at least the equivalent visualizing, modifying and measuring capabilities of solidworks/blender in their mind. You tell one to visualize a 3D object they know, and they can tell you anything about it.
It seems to me the most important thing you noticed is that in real life we don’t that often see past the surface of things (because the spectrum of light we see doesn’t penetrate most material) and thus most people don’t know the inside of 3D things very well, but that can be explained by lack of exposure rather than inability to understand 3D.
Fwiw looking at the spheres I guessed an approx 2.5 volume ratio. I’m curious, if you visualized yourself picking up these two spheres, imagining them made of a dense metal, one after the other, could you feel one is 2.3 times heavier than the previous?

Jonathan Claybrough Jul 30, 2024, 4:07 AM
13 points
4
on: This is already your second chance
I’ll give fake internet points to whoever actually follows the instructions and posts photographic proof.

Jonathan Claybrough 23 Jul 2024 12:21 UTC
2 points
1
on: Pivotal Acts are easier than Alignment?
The naming might be confusing because pivotal act sounds like a one time action, but in most cases getting to a stable world without any threat from AI requires constant pivotal processes. This makes almost all the destructive approaches moot (and they’re probably already bad for ethical concerns and many others already discussed) because you’ll make yourself a pariah.

The most promising venue for a pivotal act/pivotal process that I know of is doing good research so that ASI risks are known and proven, doing good outreach and education so most world leaders and decision makers are well aware of this, and helping setup good governance worldwide to monitor and limit the development of AGI and ASI until we can control it.

Jonathan Claybrough 21 Jul 2024 13:39 UTC
3 points
0
in reply to: Raemon’s comment on: Exercise: Planmaking, Surprise Anticipation, and “Baba is You”
I recently played Outer Wilds and Subnautica, and the exercise I recommend for both of these games is : Get to the end of the game without ever failing.
In subnautica that’s dying once, in Outer Wilds it’s a spoiler to describe what failing is (successfully getting to the end could certainly be argued to be a fail).
I failed in both of these. I played Outer Wilds first and was surprised at my fail, which inspired me to play Subnautica without dying. I got pretty far but also died from a mix of 1 unexpected game mechanic, uncareful measure of another mechanic, lack of redundancy in my contingency plans.

Jonathan Claybrough 11 Jul 2024 21:14 UTC
3 points
0
in reply to: Steven Byrnes’s comment on: Response to Dileep George: AGI safety warrants planning ahead
Oh wow, makes sense. It felt weird that you’d spend so much time on posts, yet if you didn’t spend much time it would mean you write at least as fast as Scott Alexander. Well, thanks for putting in the work. I probably don’t publish much because I want it to not be much work to do good posts but you’re reassuring it’s normal it does.