IYKYK
Alex_Altair
For anyone reading this comment thread in the future, Dalcy wrote an amazing explainer for this paper here.
See also the classic LW post, The Best Textbooks on Every Subject.
Indeed, we know about those posts! Lmk if you have a recommendation for a better textbook-level treatment of any of it (modern papers etc). So far the grey book feels pretty standard in terms of pedagogical quality.
Some small corrections/additions to my section (“Altair agent foundations”). I’m currently calling it “Dovetail research”. That’s not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.
Which orthodox alignment problems could it help with?: 9. Humans cannot be first-class parties to a superintelligent value handshake
I wouldn’t put number 9. Not intended to “solve” most of these problems, but is intended to help make progress on understanding the nature of the problems through formalization, so that they can be avoided or postponed, or more effectively solved by other research agenda.
Target case: worst-case
definitely not worst-case, more like pessimistic-case
Some names: Alex Altair, Alfred Harwood, Daniel C, Dalcy K
Add “José Pedro Faustino”
Estimated # FTEs: 1-10
I’d call it 2, averaged throughout 2024.
Some outputs in 2024: mostly exposition but it’s early days
“Gain writing skills BEFORE...”
FWIW I can’t really tell what this website is supposed to be/do by looking at the landing page and menu
The title reads ambiguous to me; I can’t tell if you mean “learn to [write well] before” or “learn to write [well before]”.
DM me if you’re interested.
I, too am quite interested in trialing more people for roles on this spectrum.
Thanks. Is “pass@1” some kind of lingo? (It seems like an ungoogleable term.)
I guess one thing I want to know is like… how exactly does the scoring work? I can imagine something like, they ran the model a zillion times on each question, and if any one of the answers was right, that got counted in the light blue bar. Something that plainly silly probably isn’t what happened, but it could be something similar.
If it actually just submitted one answer to each question and got a quarter of them right, then I think it doesn’t particularly matter to me how much compute it used.
On the livestream, Mark Chen says the 25.2% was achieved “in aggressive test-time settings”. Does that just mean more compute?
I wish they would tell us what the dark vs light blue means. Specifically, for the FrontierMath benchmark, the dark blue looks like it’s around 8% (rather than the light blue at 25.2%). Which like, I dunno, maybe this is nit picking, but 25% on FrontierMath seems like a BIG deal, and I’d like to know how much to be updating my beliefs.
things are almost never greater than the sum of their parts Because Reductionism
Isn’t it more like, the value of the sum of the things is greater than the sum of the value of each of the things? That is, (where perhaps is a utility function). That seems totally normal and not-at-all at odds with Reductionism.
I’d vote for removing the stage “developing some sort of polytime solution” and just calling 4 “developing a practical solution”. I think listing that extra step is coming from the perspective of something who’s more heavily involved in complexity classes. We’re usually interested in polynomial time algorithms because they’re usually practical, but there are lots of contexts where practicality doesn’t require a polynomial time algorithm, or really, where we’re just not working in a context where it’s natural to think in terms of algorithms with run-times.
Thank you for writing this! Your description in the beginning about trying to read about the GRT and coming across a sequence of resources, each of which didn’t do quite what you wanted, is a precise description of the path I also followed. I gave up at the end, wishing that someone would write an explainer, and you have written exactly the explainer that I wanted!
Positive feedback, I am happy to see the comment karma arrows pointing up and down instead of left and right. I have some degree of left-right confusion and was always click and unclicking my comments votes to figure out which was up and down.
Also appreciate that the read time got put back into main posts.
(Comment font stuff looks totally fine to me, both before and after this change.)
[Some thoughts that are similar but different to my previous comment;]
I suspect you can often just prove the behavioral selection theorem and structural selection theorem in separate, almost independent steps.
Prove a behavioral theorem
add in a structural assumption
prove that behavioral result plus structural assumption implies structural result.
Behavior essentially serves as an “interface”, and a given behavior can be implemented by any number of different structures. So it would make sense that you need to prove something about structure separately (and that you can prove it for multiple different types of structural assumption).
Further claims: for any given structural class,
there will be a natural simplicity measure
simpler instances will be exponentially rare.
A structural class is something like programs, or Markov chains, or structural causal models. The point of specifying structure is to in some way model how the system might actually be shaped in real life. So it seems to me that any of these will be specified with a finite string over a finite alphabet. This comes with the natural simplicity measure of the length of the specification string, and there are exponentially fewer short strings than long ones.[1]
So let’s say you want to prove that your thing X which has behavior B has specific structure S. Since structure S has a fixed description length, you almost automatically know that it’s exponentially less likely for X to be one of the infinitely many structures with description length longer than S. (Something similar holds for being within delta of S) The remaining issue is whether there are any other secret structures that are shorter than S (or of similar length) that X could be instead.
- ^
Technically, you could have a subset of strings that didn’t grow exponentially. For example, you could, for some reason, decide to specify your Markov chains using only strings of zeros. That would grow linearly rather than exponentially. But this is clearly a less natural specification method.
I got a ton of value from ILIAD last year, and strongly recommend it to anyone interested!