Have you checked what happens when you throw physic postdocs at the core issues—do they actually get traction or just stare at the sheer cliff for longer while thinking? Did anything come out of the Illiad meeting half a year later?
Is there a reason that more standard STEMs aren’t given an intro into some of the routes currently thought possibly workable, so they can feel some traction?
I think either could be true- that intelligence and skills aren’t actually useful right now, the problem is not tractable, or better onboarding could let the current talent pool get traction—and either way it might not be very cost effective to get physics postdocs involved.
Humans are generally better at doing things when they have more tools available. While the ‘hard bits’ might be intractable now, they could well be easier to deal with in a few years after other technical and conceptual advances in AI, and even other fields. (Something something about prompt engineering and Anthropic’s mechanistic interpretability from inside the field and practical quantum computing outside).
This would mean squeezing every drop of usefulness out of AI at each level of capability, to improve general understanding and to leverage it into breakthroughs in other fields before capabilities increase further.
In fact, it might be best to sabotage semiconductor/chip production once the models one gen before super-intelligence/extinction/ whatever, giving maximum time to leverage maximum capabilities and tackle alignment before the AIs get too smart.
How close is mechanistic interpretability to the hard problems, and what makes it not good enough?
A few thoughts.
Have you checked what happens when you throw physic postdocs at the core issues—do they actually get traction or just stare at the sheer cliff for longer while thinking? Did anything come out of the Illiad meeting half a year later? Is there a reason that more standard STEMs aren’t given an intro into some of the routes currently thought possibly workable, so they can feel some traction? I think either could be true- that intelligence and skills aren’t actually useful right now, the problem is not tractable, or better onboarding could let the current talent pool get traction—and either way it might not be very cost effective to get physics postdocs involved.
Humans are generally better at doing things when they have more tools available. While the ‘hard bits’ might be intractable now, they could well be easier to deal with in a few years after other technical and conceptual advances in AI, and even other fields. (Something something about prompt engineering and Anthropic’s mechanistic interpretability from inside the field and practical quantum computing outside).
This would mean squeezing every drop of usefulness out of AI at each level of capability, to improve general understanding and to leverage it into breakthroughs in other fields before capabilities increase further. In fact, it might be best to sabotage semiconductor/chip production once the models one gen before super-intelligence/extinction/ whatever, giving maximum time to leverage maximum capabilities and tackle alignment before the AIs get too smart.
How close is mechanistic interpretability to the hard problems, and what makes it not good enough?