What do you think of this argument that Grok 4 used only ~1/5th RLVR training as pretraining (~3e26 pre-training + ~6e25 RLVR)? https://x.com/tmychow/status/1943460487565578534
Peter Wildeford
What do you think of the counterargument that OpenAI announced o3 in December and publicly solicited external safety testing then, and isn’t deploying until ~4 months later?
Here’s my summary of the recommendations:
National security testing
Develop robust government capabilities to evaluate AI models (foreign and domestic) for security risks
Once ASL-3 is reached, government should mandate pre-deployment testing
Preserve the AI Safety Institute in the Department of Commerce to advance third-party testing
Direct NIST to develop comprehensive national security evaluations in partnership with frontier AI developers
Build classified and unclassified computing infrastructure for testing powerful AI systems
Assemble interdisciplinary teams with both technical AI and national security expertise
Export Control Enhancement
Tighten semiconductor export restrictions to prevent adversaries from accessing critical AI infrastructure
Control H20 chips
Require government-to-government agreements for countries hosting large chip deployments
As a prerequisite for hosting data centers with more than 50,000 chips from U.S. companies, the U.S. should mandate that countries at high-risk for chip smuggling comply with a government-to-government agreement that 1) requires them to align their export control systems with the U.S., 2) takes security measures to address chip smuggling to China, and 3) stops their companies from working with the Chinese military. The “Diffusion Rule” already contains the possibility for such agreements, laying a foundation for further policy development.
Review and reduce the 1,700 H100 no-license required threshold for Tier 2 countries
Currently, the Diffusion Rule allows advanced chip orders from Tier 2 countries for less than 1,700 H100s —an approximately $40 million order—to proceed without review. These orders do not count against the Rule’s caps, regardless of the purchaser. While these thresholds address legitimate commercial purposes, we believe that they also pose smuggling risks. We recommend that the Administration consider reducing the number of H100s that Tier 2 countries can purchase without review to further mitigate smuggling risks.
Increase funding for Bureau of Industry and Security (BIS) for export enforcement
Lab Security Improvements
Establish classified and unclassified communication channels between AI labs and intelligence agencies for threat intelligence sharing, similar to Information Sharing and Analysis Centers used in critical infrastructure sectors
Create systematic collaboration between frontier AI companies and intelligence agencies, including Five Eyes partners
Elevate collection and analysis of adversarial AI development to a top intelligence priority, as to provide strategic warning and support export controls
Expedite security clearances for AI industry professionals
Direct NIST to develop next-generation security standards for AI training/inference clusters
Develop confidential computing technologies that protect model weights even during processing
Develop meaningful incentives for implementing enhanced security measures via procurement requirements for systems supporting federal government deployments.
Direct DOE/DNI to conduct a study on advanced security requirements that may become appropriate to ensure sufficient control over and security of highly agentic models
Energy Infrastructure Scaling
Set an ambitious national target: build 50 additional gigawatts of power dedicated to AI by 2027
Streamline permitting processes for energy projects by accelerating reviews and enforcing timelines
Expedite transmission line approvals to connect new energy sources to data centers
Work with state/local governments to reduce permitting burdens
Leverage federal real estate for co-locating power generation and next-gen data centers
Government AI Adoption
across the whole of government, the Administration should systematically identify every instance where federal employees process text, images, audio, or video data, and augment these workflows with appropriate AI systems.
Task OMB to address resource constraints and procurement limitations for AI adoption
Eliminate regulatory and procedural barriers to rapid AI deployment across agencies
Direct DoD and Intelligence Community to accelerate AI research, development and procurement
Target largest civilian programs for AI implementation (IRS tax processing, VA healthcare delivery, etc.)
Economic Impact Monitoring
Enhance data collection mechanisms to track AI adoption patterns and economic implications
The Census Bureau’s American Time Use Survey should incorporate specific questions about AI usage, distinguishing between personal and professional applications while gathering detailed information about task types and systems employed.
Update Census Bureau surveys to gather detailed information on AI usage and impacts
Collect more granular data on tasks performed by workers to create a baseline for monitoring changes
Track the relationship between AI computation investments and economic performance
Examine how AI adoption might reshape the tax base and cause structural economic shifts
Peter Wildeford’s Shortform
If you’ve liked my writing in the past, I wanted to share that I’ve started a Substack: https://peterwildeford.substack.com/
Ever wanted a top forecaster to help you navigate the news? Want to know the latest in AI? I’m doing all that in my Substack—forecast-driven analysis about AI, national security, innovation, and emerging technology!
My current working take is that it is at the level of a median-but-dedicated undergraduate of a top university who is interested and enthusiastic about AI safety. But Deep Research can do in 10 minutes what would take that undergraduate about 20 hours.
Happy to try a prompt for you and see what you think.
This was very helpful for you to put together—thank you!
I’d also add that historically I believe about two-thirds of LTFF’s money has also come from OpenPhil, so LTFF doesn’t represent a fully independent funder (though the decisionmaking around grants is pretty independent).
This is a great poll and YouGov is a highly reputable pollster, but there is a significant caveat to note about the pause finding.
The way the question is framed provides information about “1000 technology leaders” who have signed a letter in favor of the pause but does not mention any opposition to the pause. I think this would push respondents to favor the pause. Ideal question construction would present more neutrally with both support and oppose statements.
Good to know you won’t be out of a job just yet!
I asked GPT4 the same prompt and here is what it said:
Of course! Here’s a new contra dance for you:
Contra Dance
Whirlwind Delight
By ChatGPT
(Duple improper)A1
(8) Circle left 3⁄4
(8) Neighbor swingA2
(8) Long lines forward and back
(8) Ladies chainB1
(8) Star right 1x
(8) Gents aleman left 1.5B2
(8) Half hey, ladies pass right shoulders to start
(8) Partner balance and swing, end facing down in a line of fourEnjoy dancing Whirlwind Delight!
How did it do?
Yep! I was wrong and this is false!
If we want to know what arguments resonate with New York Times articles we can actually use surveys, message testing, and focus groups to check and we don’t need to guess! (Disclaimer: My company sells these services.)
- 25 Mar 2023 5:32 UTC; 3 points) 's comment on The Overton Window widens: Examples of AI risk in the media by (
Cool—I’ll follow up when I’m back at work.
That makes a lot of sense. We can definitely test a lot of different framings. I think the problem with a lot of these kinds of problems is that they are low saliency, and thus people tend not to have opinions already, and thus they tend to generate an opinion on the spot. We have a lot of experience polling on low saliency issues though because we’ve done a lot of polling on animal farming policy which has similar framing effects.
I’ll shill here and say that Rethink Priorities is pretty good at running polls of the electorate if anyone wants to know what a representative sample of Americans think about a particular issue such as this one. No need to poll Uber drivers or Twitter when you can do the real thing!
Yeah, it came from a lawyer. The point being that if you confess to something bad, we may be legally required to repot that, so be careful.
Feel free to skip questions if you feel they aren’t applicable to you.
If you’re considering a career in AI policy, now is an especially good time to start applying widely as there’s a lot of hiring going on right now. I documented in my Substack over a dozen different opportunities that I think are very promising.