Ben Goldhaber

Karma: 962

Ben Goldhaber Apr 22, 2025, 3:35 PM
14 points
9
on: bgold’s Shortform
AI for improving human reasoning seems promising; I’m uncertain whether it makes sense to invest in new custom applications, as maybe improvements in models are going to do a lot of the work.
I’m more bullish on investing in exploration of promising workflows and design patterns. As an example, a series of youtube videos and writeups on using O3 as a forecasting aid for grantmaking, with demonstrations. Or a set of examples of using LLMs to aid in productive meetings, with a breakdown of the tech used and social norms that the participants agreed to.
- I think these are much cheaper to do in terms for time and money.
- A lot of epistemics seems to be HCI bottlenecked.
- Good design patterns are easily copyable, which also means they’re probably underinvested in relative to their returns.
- Social diffusion of good epistemic practices will not necessarily hapepn as fast as AI improvements.
- Improving the AIs themselves to be more truth seeking and provide good advice—with good benchmarks—is another avenue.
I imagine a fellowship for prompt engineers and designers, prize competitions, or perhaps retroactive funding for people who have already developed good patterns.

Ben Goldhaber Apr 14, 2025, 3:48 PM
9 points
0
on: bgold’s Shortform
I think people should write a bunch of their own vignettes set in the AI 2027 universe. Small snippets of life predictions as things get crazy, on specific projects that may or may not bend the curve, etc.

Ben Goldhaber Apr 1, 2025, 8:34 PM
5 points
2
in reply to: Ben Goldhaber’s comment on: Provably Safe AI: Worldview and Projects
fyi @Zac Hatfield-Dodds my probability has fallen below 10% - I expected at least one relevant physical<>cyber project to have started in the past six months, since it hasn’t I doubt this will make the timeline. While not conceding (because I’m still unsure how far AI uplift alone gets us), seems right to note the update.

Ben Goldhaber Mar 14, 2025, 10:19 PM
3 points
0
in reply to: Dalcy’s comment on: bgold’s Shortform
good to know thanks for flagging!

Ben Goldhaber Mar 14, 2025, 7:30 PM
24 points
0
on: bgold’s Shortform
Recently learned about Acquired savant syndrome. https://en.wikipedia.org/wiki/Jason_Padgett
After the attack, Padgett felt “off.” He assumed it was an effect of the medication he was prescribed; but it was later found that, because of his traumatic brain injury, Padgett had signs of obsessive–compulsive disorder and post-traumatic stress disorder.^[5] He also began viewing the world through a figurative lens of mathematical shapes.
“Padgett is one of only 40 people in the world with “acquired savant syndrome,” a condition in which prodigious talents in math, art or music emerge in previously normal individuals following a brain injury or disease.
this makes it seem more likely to me that bio interventions for increases in IQ in adult humans is possible, though likely algernon’s law holds and there’s a cost.
h/t @Jesse Hoogland

Ben Goldhaber Feb 8, 2025, 2:55 AM
1 point
0
in reply to: Zac Hatfield-Dodds’s comment on: In response to critiques of Guaranteed Safe AI
Minor point: It seems unfair to accuse GSAI of being vaporware. It has been less than a year since the GSAI paper came out and 1.5 since Tegmark/Omohundro’s Provably Safe paper, and there are many projects being actively funded through ARIA and others that should serve as tests. No GSAI researchers that I know of promised significant projects in 2024 - in fact several explicitly think the goal should be to do deconfusion and conceptual work now and plan to leverage the advances in autoformalization and AI-assisted coding that are coming down the pipe fast.
While I agree that there are not yet compelling demonstrations, this hardly seems at the level of Duke Nukem Forever!

Ben Goldhaber Jan 31, 2025, 7:31 PM
1 point
0
on: bgold’s Shortform
what are the bottlenecks preventing 10x-100x scaling of Control Evaluations?
- I’m not confident in the estimates of the safety margin we get from internal only evaluations—the challenge of eliciting strong subversion performance seems very hard for getting satisfactory estimates of the subversion capability of models against control protocols.
- I’d feel more confident if we had thousands of people trying to create red-team models, while thousands of blue teams propose different monitoring methods, and control protocols.
- The type of experiments described in (1,2,3) seem particularly amenable to this type of crowdsourcing competitions.
- This seems like it fits naturally into DefCon-esque (my catchall for the hacker community) security competitions; I think this would be very popular, even without a large monetary prize.
- Success would look like having strong belief that a given control procedure has lower and upper bound estimates for catching the best attempts humanity can make at subverting control procedures w/ (red-teamed models, humans), and ideally have an estimate of the cost of such control protocols.
- I think the reason this would fail is:
  - lack of access to the actual frontier models (perhaps a lab could partner with a competition organizer, or rely on good open source models)
  - lack of good data collection necessary to have confidence in the results or outcomes, metrics might be hard to meaningfully create.
  - would the competition reflect the local conditions of frontier labs well enough that labs can generalize from the results? Ideally you have a consortium effort helping to ensure the setup reflects reality.
  - generally operationally difficult to coordinate lots of people.
Are there others?

Ben Goldhaber Jan 30, 2025, 7:39 PM
37 points
7
on: bgold’s Shortform
I think more leaders of orgs should be trying to shape their organizations incentives and cultures around the challenges of “crunch time”. Examples of this include:
- What does pay look like in a world where cognitive labor is automated in the next 5 to 15 years? Are there incentive structures (impact equity, actual equity, bespoke deals for specific scenarios) that can help team members survive, thrive, and stay on target?
- What cultural norms should the team have to AI assisted work? On the one hand it seems necessary to accelerate safety progress, on the other I expect many applications are in fact trojan horses designed to automate people out of jobs (looking at you MSFT rewind) - are there credible deals to be made that can provide trust?
- Does the organization expect to be rapidly changing to new events in AI—and if so how will sensemaking happen—or does it expect to make it’s high conviction bet early on and stay the course through distractions? Do teammembers know that?
I have more questions than answers, but the background level of stress and disorientation for employees and managers will be rising, especially in AI Safety orgs, and starting to come up w/ contextually true answers (I doubt there’s a universal answer) will be important.

Ben Goldhaber Jan 16, 2025, 7:45 PM
3 points
0
in reply to: Charbel-Raphaël’s comment on: Davidad’s Bold Plan for Alignment: An In-Depth Explanation
This post was one of my first introductions to davidad’s agenda and convinced me that while yes it was crazy, it was maybe not impossible, and it led me to working on initiatives like the multi-author manifesto you mentioned.
Thank you for writing it!

Ben Goldhaber Jan 16, 2025, 7:33 PM
3 points
0
in reply to: Jonas Hallgren’s comment on: Building AI Research Fleets
I would be very excited to see experiments with ABMs where the agents model fleets of research agents and tools. I expect in the near future we can build pipelines where the current fleet configuration—which should be defined in something like the terraform configuration language—automatically generates an ABM which is used for evaluation, control, and coordination experiments.

Ben Goldhaber Nov 11, 2024, 9:20 PM
12 points
0
on: bgold’s Shortform
- Cumulative Y2K readiness spending was approximately $100 billion, or about $365 per U.S. resident.
- Y2K spending started as early 1995, and appears t peaked in 1998 and 1999 at about $30 billion per year.
https://www.commerce.gov/sites/default/files/migrated/reports/y2k_1.pdf

Ben Goldhaber Sep 4, 2024, 5:40 PM
1 point
0
in reply to: Zac Hatfield-Dodds’s comment on: Provably Safe AI: Worldview and Projects
Ah gotcha, yes lets do my $1k against your $10k.

Ben Goldhaber Sep 3, 2024, 5:14 PM
1 point
0
in reply to: Zac Hatfield-Dodds’s comment on: Provably Safe AI: Worldview and Projects
Given your rationale I’m onboard for 3 or more consistent physical instances of the lock have been manufactured.

Lets ‘lock’ it in.

Ben Goldhaber Aug 21, 2024, 11:50 PM
1 point
0
in reply to: Zac Hatfield-Dodds’s comment on: Provably Safe AI: Worldview and Projects
@Raemon works for me; and I agree with the other conditions.
What links here?
- Noosphere89's comment on The Hopium Wars: the AGI Entente Delusion by Max Tegmark (Oct 14, 2024, 1:51 AM; 6 points)

Ben Goldhaber Aug 19, 2024, 4:12 AM
1 point
0
in reply to: Zac Hatfield-Dodds’s comment on: Provably Safe AI: Worldview and Projects
This seems mostly good to me, thank you for the proposals (and sorry for my delayed response, this slipped my mind).
OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn’t count)
Why this condition? It doesn’t seem relevant to the core contention, and if someone prototyped a single lock using a GS AI approach but didn’t figure out how to manufacture it at scale, I’d still consider it to have been an important experiment.
Besides that, I’d agree to the above conditions!

Ben Goldhaber Aug 12, 2024, 3:16 PM
5 points
0
in reply to: Zac Hatfield-Dodds’s comment on: Provably Safe AI: Worldview and Projects
- (8) won’t be attempted, or will fail at some combination of design, manufacture, or just-being-pickable. This is a great proposal and a beautifully compact crux for the overall approach.
I agree with you that this feels like a ‘compact crux’ for many parts of the agenda. I’d like to take your bet, let me reflect if there’s any additional operationalizations or conditioning.
However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification—whatever actually works in practice.
FWIW in Towards Guaranteed Safe AI I we endorse this: “Moreover, while we have argued for the need for verifiable quantitative safety guarantees, it is important to note that GS AI may not be the only route to achieving such guarantees. An alternative approach might be to extract interpretable
policies from black-box algorithms via automated mechanistic interpretability… it is ultimately an empirical question whether it is easier to create interpretable world models or interpretable policies in a given domain of operation.”

Ben Goldhaber Jul 13, 2024, 12:13 AM
8 points
2
in reply to: Ryan Kidd’s comment on: Ryan Kidd’s Shortform
I agree with this, I’d like to see AI Safety scale with new projects. A few ideas I’ve been mulling:

- A ‘festival week’ bringing entrepreneur types and AI safety types together to cowork from the same place, along with a few talks and lot of mixers.
- running an incubator/accelerator program at the tail end of a funding round, with fiscal sponsorship and some amount of operational support.
- more targeted recruitment for specific projects to advance important parts of a research agenda.
It’s often unclear to me whether new projects should actually be new organizations; making it easier to spin up new projects, that can then either join existing orgs or grow into orgs themselves, seems like a promising direction.

Ben Goldhaber Apr 27, 2023, 2:17 PM
7 points
1
on: Davidad’s Bold Plan for Alignment: An In-Depth Explanation
First off thank you for writing this, great explanation.
- Do you anticipate acceleration risks from developing the formal models through an open, multilateral process? Presumably others could use the models to train and advance the capabilities of their own RL agents. Or is the expectation that regulation would accompany this such that only the consortium could use the world model?
- Would the simulations be exclusively for ‘hard science’ domains—ex. chemistry, biology—or would simulations of human behavior, economics, and politics also be needed? My expectation is that it would need the latter, but I imagine simulating hundreds of millions of intelligent agents would dramatically (prohibitively?) increase the complexity and computational costs.

Ben Goldhaber Jan 8, 2023, 6:11 PM
3 points
0
in reply to: James_Miller’s comment on: Protectionism will Slow the Deployment of AI
This seems like an important crux to me, because I don’t think greatly slowing AI in the US would require new federal laws. I think many of the actions I listed could be taken by government agencies who over-interpret their existing mandates given the right political and social climate. For instance, the eviction moratorium during COVID, obviously should have required congressional action, but was done by fiat through an over-interpretation of authority by an executive branch agency.
What they do or do not do seems mostly dictated by that socio-political climate, and by the courts, which means less veto points for industry.

Ben Goldhaber Jan 8, 2023, 3:00 PM
1 point
0
in reply to: James_Miller’s comment on: Protectionism will Slow the Deployment of AI
I agree that competition with China is a plausible reason regulation won’t happen; that will certainly be one of the arguments advanced by industry and NatSec as to why it should not be throttled. However, I’m not sure, and currently don’t think it will, be stronger than the protectionist impulses,. Possibly it will exacerbate the “centralization” of AI dynamic that I listed in the ‘licensing’ bullet point, where large existing players receive money and de-facto license to operate in certain areas and then avoid others (as memeticimagery points out). So for instance we see more military style research, and GooAmBookSoft tacitly agree to not deploy AI that would replace lawyers.
To your point on big tech’s political influence; they have, in some absolute sense, a lot of political power, but relatively they are much weaker in political influence than peer industries. I think they’ve benefitted a lot from the R-D stalemate in DC; I’m positing that this will go around/through this stalemate, and I don’t think they currently have the softpower to stop that.