Software Engineer (formerly) at Microsoft who may focus on the alignment problem for the rest of his life (please bet on the prediction market here).
Sheikh Abdur Raheem Ali
Thanks for writing this up. I spent 2-3 days this month encouraging the UAE to set up an AI Security Institute. I was in Abu Dhabi during President Trump’s visit and spoke briefly with a few government officials whose title was “Chief AI Officer” of ministry such-and-such. I mostly wanted to know if they’d be interested in potentially collaborating with other AISIs. However, I’m not planning to continue working on UAE AISI stuff in the future.
Thanks for writing this up. I spent 2-3 days this month encouraging the UAE to set up an AI Security Institute. I was in Abu Dhabi during President Trump’s visit and spoke briefly with a few government officials whose title was “Chief AI Officer” of ministry such-and-such. I mostly wanted to know if they’d be interested in potentially collaborating with other AISIs. However, I’m not planning to continue working on UAE AISI stuff in the future.
It’s almost certainly untrue that OpenAI is doing per-user finetuning for personalization.
Each user-tuned model (or even a LoRA adapter) must stick to the node where it was trained. You lose the ability to route traffic to any idle instance. OpenAI explicitly optimises for pooling capacity across huge fleets; tying sessions to devices would crater utilisation.
I’m going to remember the point about screenshot parsing being a weak point for ‘agents’.
Why isn’t this part of the sequence on your research process?
How exactly are you measuring coding ability? What are the ways you’ve tried to upskill, and what are common failure modes? Can you describe your workflow at a high-level, or share a recording? Are you referring to competence at real world engineering tasks, or performance on screening tests?
There’s a chrome extension which lets you download leetcode questions as jupyter notebooks: https://github.com/k-erdem/offlineleet. After working on a problem, you can make a markdown cell with notes and convert it into flashcards for regular review: https://github.com/callummcdougall/jupyter-to-anki.
I would suggest scheduling calls with friends for practice sessions so that they can give you personalized feedback about what you need to work on.
I count 55 mentors at https://www.matsprogram.org/mentors, implying a mentor acceptance rate of 47%.
“White House may ease restrictions on selling AI chips to UAE.” and “UAE steps back from building homegrown AI models” seem to contradict each other.
I’d heard about the AI in UAE K-12 classrooms already before reading about it here.
2 percentage points.
Well, really it is a function of the current likelihood, but my prior on that has error bars in the log scale.
Main outcome from GISEC:
Caught up with my former classmate Youssef Awad, Head of Engineering at ctf.ae. He offered to introduce me to H.E Al Kuwaiti, UAE’s Head of Cybersecurity.
Consider updating “Auditing Deception for Interp is out of reach” with a link to https://www.lesswrong.com/posts/PwnadG4BFjaER3MGf/interpretability-will-not-reliably-find-deceptive-ai
One percent of the world’s AI compute (LLM-grade GPU capacity) is in the UAE which does not have an AI Security Institute. I’ve planned to spend 6-9% of my bandwidth this month (2-3 days during May 2025) on encouraging the UAE to establish an AISI. Today is the first day.
However in my view even the most optimistic impact estimate of the successful execution of that plan doesn’t realistically lead to a greater than 2% shift in the prediction market of the UAE starting an AI Security Institute before 2026. Even if a UAE AISI existed, then it would not be allocated more than 1% to 5% (mode 2%) of the overall national AI budget (roughly $2b). Taking 2% of 2% of $2b is a maximum valuation of $800k for the entire project (I think that the median valuation would be significantly lower, but I’m not using the maximum here to be generous, but because I believe that — for this system — using the max value is more informative for decision making and easier to estimate than evaluating the 95th or 99.9th percentile value).
I was talking about this with my dad earlier, whose take was that attending the 1 day https://govaisummitmea.com/ on May 13th would be less than 0.01% of the work involved in actually pulling this thing off. My understanding of what he meant in more formal terms is that if your goal is for the UAE to have an AISI before 2026, and you then decompose each step of the plan to achieve that outcome into players in a Shapley value calculation, then acquiring these tickets has an average marginal contribution of at most 0.0001 times at most $800k, which is $80. And it would be foolish to pay the cost of one day of my time as well as tickets for me plus a collaborator when the return on that investment is — by this model — capped at $80.
Although my dad’s take here is reasonable given the information available to him, he doesn’t have the full context and the time he can allocate to learning about this project is restricted. Even though he is burdened by the demands of his various responsibilities, I’m grateful that supporting me in particular is one that he has always prioritized. I love Abba!
Here’s why Abba is wrong. There are 400 total seats. The cost is USD 1299 per head, so 2600 to register two attendees. At this price range it makes sense for a company to reimburse the fee to send representatives if it profits from building relationships with UAE policymakers. These will mostly be orgs not working directly on reducing AI x-risk. Although having 0.5% of attendees be alignment researchers is unlikely to affect the overall course of discussions, it is a counterweight which balances against worlds where this minority group has zero participation in these conversations. I think it may be as much as 3.25% of the work needed, which is ten times more than the floor break even point of 0.325% (2600/800k). But besides that, my team has been approved for up to 5 seats, we have a 10-page draft policy memo coauthored with faculty at https://mbrsg.ae/, and we can just ask for the registration fee to be waived. I agree that it would be insane to pay the full amount out of pocket. (Edit: we got the waiver request approved.)Here’s why Abba is right. This post was written after midnight. Later today I will go to the Middle East and Africa’s largest cybersecurity event, https://gisec.ae/. I look forward to further comments and reviews.
- May 20, 2025, 7:38 PM; 9 points) 's comment on AISN #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States by (
Thanks for doing this— I found it really helpful.
Thank you, this is great work. I filled out the external researcher interest form but was not selected for Team 4.
I’m not sure that Team 4 were on par with what professional jailbreakers could achieve in this setting. I look forward to follow up experiments. This is bottlenecked by the absence of an open source implementation of auditing games. I went over the paper with a colleague. Unfortunately we don’t have bandwidth to replicate this work ourselves. Is there a way to sign up to be notified once a playable auditing game is available?
I’d also be eager to help beta-test pre-release versions. Let me know if someone is planning to put this up on the web as a product which allows the general public to play crowdsourced auditing games.
He might want to consider taking a look at https://manifund.org/projects/ozempic-for-sleep-proof-of-concept-research-for-safely-reducing-sleep
True. Lower haircuts allow for more leverage, typically defined as the diagonal elements of the hat matrix, does wearing a fedora impact the degree of hair chaos?
Hey Neel, I’ve heard you make similar remarks informally at talks or during Q&A sessions in past in-person panels and events, and it’s great that you’ve written them up so that they’re available in a nuanced format to a broader audience. I agree with the points you’ve made, but have a slightly different perspective on how it connects to the example of people asking for your strategic takes specifically, which I’ll share below (without presumption).
TL;DR: “Good strategic takes are hard to measure; but status is easy to recognize”.
I. Executive Summary
People aren’t necessarily confusing research prowess with strategic insight. Rather, they recognize you as having achieved elite social standing within the field of AI more broadly and want:
Access to perceived insider knowledge
Connection to high-status individuals
The latest thinking from those “in the room”
II. Always Use The Best Introduction
Before reading this post, I believed that the median person asking these questions was motivated by your impressive academic performance during your undergraduate studies, something that can be (over)simplified to “wow, this guy studied pure math at Cambridge and ranked top of his class, he’s one of the smartest people in the world, and smart people are correct about lots of things, he might have a correct answer to this question I have!”. I’m quite embarrassed to admit that this is pretty much what was going through my head when I attended a session you were holding during EAG last year, and I wouldn’t be surprised if others there were thinking that too.
Similarly along those lines, I recall reaching out to one of your former mentees for a 1:1 thinking, “wow, this guy studied computer science at Cambridge and ranked top of his class, he’s one of the smartest people in the world, and smart people are correct about lots of things!”. I also took the time to read his dissertation, and found it interesting, but that first impression mattered a lot more than it should have. An analogy is that when people are selecting the model to use for a task, they want to use the best model for that task. But if a model takes the top spot on the leaderboard where test scores are easy to measure, then that tends to mess with human psychology which irrationally pattern matches and assumes generalization across every possible task.
III. My Key Takeaway:
My key takeaway was that although that this winner-take-all dynamic may have played one factor, your model assigns more weightage towards the work you’ve done after graduating and pioneering the field of mechinterp.
IV. Credentials vs. Accomplishments
To be clear, founding mechinterp is a greater accomplishment than any formal credential. But even though teams of researchers at frontier labs are working on this agenda, it’s not mainstream yet (just take a look at mechinterp.com), whereas the handle of “math/cs genius” is generic enough as a concept to be legible to the average person. The arguments in your post about research being an empirical science requiring skills not especially relevant to strategy are locally valid, but these points are the furthest thing from the mind of those waiting in line at conferences to ask what your p(doom) is.
V. The Tyranny of the Marginal Spice Jar
Often the demands placed upon us by our environment play an instrumental role in shaping our skillset, because we adapt against the pressures placed upon us. I’m thankfully not in a leadership position where the role calls for executive project management decisions which require a solid understanding of the broader field and industry. I’m also grateful that I’m not a public figure with a reputation to maintain whose every move is open to scrutiny and close examination. I also understand that blog posts aren’t meant to be epistemically bulletproof.
I think that it’s true that when the people you speak with the most (e.g work colleagues or MATS scholars) ask you about your thoughts, their respect is based on the merits of the technical research you’ve published. And in general, when anyone publishes great AI research, then that does inspire interest in that person’s AI takes.
VI. Unnecessarily Skippable Digression Into Social Bubbles and Selection Effects
Your social circle is heavily filtered by a competitive application process which strongly selects for predicted ability to do quality research. This can distort intuitions around the prevalence of certain traits which are not as well represented in the common population. For example, authoring code or research papers requires to some extent that your brain is adapted for processing text content, the implications of which I haven’t seen discussed in depth anywhere on lesswrong. If someone expresses a strong preference for reading above watching a video when both options are available, it’s almost like a secret handshake, because so many cracked engineers have told me this that it’s become a green flag. In this world, entertainment culture and information transfer happens from books, web novels, articles, etc.
There’s an entirely separate world occupied by someone with the opposite preference, i.e wanting to watching a video above reading text when both options are available, an example secret handshake for that is when my Uber driver tells me that they’re cutting down on instagram. I admit this is a shallow heuristic but it’s become a red flag I watch out for indicating a potential vulnerability to predatory social media dark patterns or television binge-watching. It’s not an issue of self-control, people in the first group need to apply cognitive effort to pick things up from videos, but might have difficulties setting aside an engaging fantasy web serial. Most treatment of this topic I’ve seen addresses the second group, which feels alienating to me, as if there’s this ongoing dimorphism between producers and users of consumer software.
I’m typically skeptical of “high IQ bubble” typed arguments since they tend to prove too much, so I’ll make a more specific point. I agree with you that within these groups, conflation between perceived research skills and strategic skill does occur. My (minor) contention is that I don’t think that this particular mistake is the one being made by the average person asking a speaker about their strategic takes at the end of a talk.
VII. Main Argument: Research Takes?! What Research Takes?!!
Like, these sort of questions aren’t just being fielded by researchers in the field, you know. Why do people ask random celebrities and movie stars about their takes on geopolitics? Are they genuinely conflating acting skill with strategic skill? What about pro athletes? Is physical skill being conflated with strategic skill too? Do you believe that if a rich heiress with no research background was giving a talk about AI risk, that no one in the audience would be interested in her big picture takes? It makes no sense. Other comments have pointed this out already, so I’m sorry about adding another rant to the pile, but there exists a simpler explanation which does a better job of tracking reality!
The missing ingredient here is clout.
Various essays go into the relationship between competence and power, but what you’re describing as “research skill” can be renamed expertise. These folks aren’t mistaking you for someone high in “strategic skill”, instead they are making the correct inference that you are an elite. They want in on the latest gossip behind the waitlist at the exclusive private social where frontier lab employees are joking around about what name they’ll use for tomorrow’s new model. They’re holding their breath waiting for invention and hyperstition and self-fulfilling prophecy. They want to know the story of how Elon Musk will save the U.S AISI and call it xAISI.
IX. Concluding Apologetic Remarks
I’m not sure if this was an aim for the above post, but it’s an understandable impulse to want to distance oneself from scenes where it’s easier to find elites (good strategic takes) than experts (good research takes), because there can be a certain culture attached which often fails to act in a way that consistently upholds virtuous truth-seeking.
Overall, I think that taking a public stance can warp the landscape being describing in ways that are hard to predict, and appreciate your approach here compared to the influencer extreme of “my strategic takes are all great, the best, and bigly” versus the corporate extreme of “oh there are so many great takes, how could I pick one, great takes, thanks all”. The position of “yeah I’ve got takes but chill they’re mid” is a reasonable midpoint, and it would be nice to have people defer more intelligently in general.
Relevant quote:
However, I don’t know how much alignment faking results will generalize. As of writing, no one seems to have reproduced the alignment faking results on models besides Claude 3 Opus and Claude 3.5 Sonnet. Even Claude 3.7 Sonnet doesn’t really alignment fake: “Claude 3.7 Sonnet showed marked improvement, with alignment faking dropping to <1% of instances and a reduced compliance gap of only 5%.”
Some mats scholars (Abhay Sheshadri and John Hughes) observed minimal or no alignment faking from open-source models like llama-3.1-70b and llama-3.1-405b. However, preliminary results suggest gpt-4-o seems to alignment fake more often when finetuned on content from Evan Hubinger’s blog posts and papers about “mesaoptimizers.”
From: Self-Fulfilling Misalignment Data Might Be Poisoning Our AI Models
I’m not trying to say that any of this applies in your case per se. But when someone in a leadership position hires a personal assistant, their goal may not necessarily be to increase their short term happiness, even if this is a side effect. The main benefit is to reduce load on their team.
If there isn’t a clear owner for ops adjacent stuff, people in high-performance environments will randomly pick up ad-hoc tasks that need to get done, sometimes without clearly reporting this out to anyone, which is often societally inefficient relative to their skillset and a bad allocation of bandwidth given the organization’s priorities.
A great personal assistant wouldn’t just help you get more done and focus on what matters, but also handle various things which may be spilling over to those who are paying attention to your needs and acting to ensure they are met without you noticing or explicitly delegating.
Thanks for sharing. I read the entire model card instead of starting with the reward hacking section. The parts I personally found most interesting were sections 4.1.1.5 and 7.3.4. Why isn’t the underperformance of Claude Opus 4 on the AI research evaluation suite considered as evidence of potential sandbagging?