I like Hasselt and Meyn (extremely friendly, possibly too friendly for you)
technicalities
Maybe he dropped the “c” because it changes the “a” phoneme from æ to ɑː and gives a cleaner division in sounds: “brac-ket” pronounced together collides with “bracket” where “braa-ket” does not.
“Safety as a Scientific Pursuit” (2024)
It’s under “IDA”. It’s not the name people use much anymore (see scalable oversight and recursive reward modelling and critiques) but I’ll expand the acronym.
The story I heard is that Lightspeed are using SFF’s software and SFF jumped the gun in posting them and Lightspeed are still catching up. Definitely email.
d’oh! fixed
no, probably just my poor memory to blame
Yep, no idea how I forgot this. concept erasure!
Interesting. I hope I am the bearer of good news then
thankyou!
Not speaking for him, but for a tiny sample of what else is out there, ctrl+F “ordinary”
yeah you’re right
If the funder comes through I’ll consider a second review post I think
You’re clearly right, thanks
Thanks!
Being named isn’t meant as an honorific btw, just a basic aid to the reader orienting.
Ta!
I’ve added a line about the ecosystems. Nothing else in the umbrella strikes me as direct work (Public AI is cool but not alignment research afaict). (I liked your active inference paper btw, see ACS.)
A quick look suggests that the stable equilibrium things aren’t in scope—not because they’re outgroup but because this post is already unmanageable without handling policy, governance, political economy and ideology. The accusation of site bias against social context or mechanism was perfectly true last year, but no longer, and my personal scoping should not be taken as indifference.
Of the NSF people, only Sharon Li strikes me as doing things relevant to AGI.
Happy to be corrected if you know better!
I like this. It’s like a structural version of control evaluations. Will think where to put it in
One big omission is Bengio’s new stuff, but the talk wasn’t very precise. Sounds like Russell:
With a causal and Bayesian model-based agent interpreting human expressions of rewards reflecting latent human preferences, as the amount of compute to approximate the exact Bayesian decisions increases, we increase the probability of safe decisions.
Another angle I couldn’t fit in is him wanting to make microscope AI, to decrease our incentive to build agents.
I care a lot! Will probably make a section for this in the main post under “Getting the model to learn what we want”, thanks for the correction.
As of two years ago, the evidence for this was sparse. Looked like parity overall, though the pool of “supers” has improved over the last decade as more people got sampled.
There are other reasons to be down on XPT in particular.