phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/
Jan
Thanks for the comment! I’m curious about the Anthropic Codex code-vulnerability prompting, is this written up somewhere? The closest I could find is this, but. I don’t think that’s what you’re referencing?
I was not aware of this, thanks for pointing this out! I made a note in the text. I guess this is not an example of “advanced AI with an unfortunately misspecified goal” but rather just an example of the much larger class of “system with an unfortunately misspecified goal”.
Thanks for the comment, I did not know this! I’ll put a note in the essay to highlight this comment.
Iiinteresting! Thanks for sharing! Yes, the choice of how to measure this affects the outcome a lot..
Pop Culture Alignment Research and Taxes
Hmm, fair, I think you might get along fine with my coworker from footnote 6 :) I’m not even sure there is a better way to write these titles—but they can still be very intimidating for an outsider.
Yes, I agree, a model can really push intuition to the next level! There is a failure mode where people just throw everything into a model and hope that the result will make sense. In my experience that just produces a mess, and you need some intuition for how to properly set up the model.
Hi! :) Thanks for the comment! Yes, that’s on purpose, the idea is that a lot of the shorthand in molecular neuroscience are very hard to digest. So since the exact letters don’t matter I intentionally garbled them with a Glitch Text Generator. But perhaps that isn’t very clear without explanation, I’ll add something.
This word Ǫ̵͎͊
G̦̉̇O-GlcNAc”l͉͇̝̽͆̚i̷͔̓̏͌c̷̱̙̍̂͜k̷̠͍͌l̷̢̍͗̃n̷̖͇̏̆å̴̤c̵̲̼̫͑̎̆ f.e. a garbled version of O-GLicklnac, which in term is the phonetic version of “
A Brief Excursion Into Molecular Neuroscience
Theory #4 appears very natural to me, especially in the light of papers like Chen et al 2006 or Cuntz et al 2012. And another supporting intuition from developmental neuroscience is that development is a huge mess and that figuring out where to put a long-range connection is really involved. And there can be a bunch of circuit remodeling on a local scale, once you established a long-range connection, there is little hope of substantially rewiring it.
In case you want to dive deeper into this (and you don’t want to read all those papers), I’d be happy to chat more about this :)
I’ve been meaning to dive into this for-e-ver and only now find the time for it! This is really neat stuff, haven’t enjoyed a framework this much since logical induction. Thank you for writing this!
Yep, I agree, SLIDE is probably a dud. Thanks for the references! And my inside view is also that current trends will probably continue and most interesting stuff will happen on AI-specialized hardware.
Thank you for the comment! You are right, that should be a ReLu in the illustration, I’ll fix it :)
Compute Governance: The Role of Commodity Hardware
A survey of tool use and workflows in alignment research
On Context And People
Via productiva—my writing and productivity framework
Trust-maximizing AGI
Great explanation, I feel substantially less confused now. And thank you for adding two new shoulder advisors to my repertorie :D
Interesting, thank you! I guess I was thinking of deception as characterized by Evan Hubinger, with mesa-optimizers, bells, whistles, and all. But I can see how a sufficiently large competence-vs-performance gap could also count as deception.