I only have one key prediction, from something I worked on, and by the time it comes out in the research in general, it won’t matter to prove that I knew it was coming, so I just haven’t really been trying to communicate it for a few years. jessicata and jack gallagher already know what I’m talking about (and last I talked about it with them, they seemed to think I was kinda crazy). everything else is “things continue more or less as they have been”, so I can just refer to this post for that part. What might be able to impact the future would be to explain directly to an AI safety org that can use the info. Thing is, I’ve already been making all the downstream claims that matter; read my various ramblings to get a sense of what I’m talking about. The actual insight is more or less a trick for how to improve grokking strength significantly, and there are a lot of researchers trying to do that and some seem to be on the right track to figure out the basic idea themselves; fragments have been in the literature for a while. We never got the capabilities thing working at full strength, though the old super janky demo that convinced us it ought to work could be refined into a workable demo that can be shown to others. I’d rather just make claims about what people could do with better grokking and why safety people should be aiming to prepare to do stronger formal verification on better-grokked representations. These days I’m trying to get my productivity back up after burnout from $lastjob (significant progress! turns out the trick to remembering I can code is to be like, wait, all I have to do is realize I’m healthy and I’m healthy. funny how that works) and trying to spin up on my cellular automata coprotection research plan hunch, which I’ll hopefully be scooped on anyway, see above about burnout. I’ve been talking to uli and might write up the coprotection thing soon.
The interesting stuff is all in the consequences of what it makes possible. We should expect to be able to make margin proofs on larger physical systems than one might expect, given better grokking, and we should be thinking about what proofs we’d like on larger physical systems. I know everyone thinks I’m crazy for expecting this, but I suspect we might be able to formally verify margin of energy input before the cpu seL4 runs on misbehaves, for example. So the question is, what margin of error proof do we want in order to ask if a chunk of matter has retained its agency? And it seems like the bulk of safety folks are already on the right track to figure that out.
I only have one key prediction, from something I worked on, and by the time it comes out in the research in general, it won’t matter to prove that I knew it was coming, so I just haven’t really been trying to communicate it for a few years. jessicata and jack gallagher already know what I’m talking about (and last I talked about it with them, they seemed to think I was kinda crazy). everything else is “things continue more or less as they have been”, so I can just refer to this post for that part. What might be able to impact the future would be to explain directly to an AI safety org that can use the info. Thing is, I’ve already been making all the downstream claims that matter; read my various ramblings to get a sense of what I’m talking about. The actual insight is more or less a trick for how to improve grokking strength significantly, and there are a lot of researchers trying to do that and some seem to be on the right track to figure out the basic idea themselves; fragments have been in the literature for a while. We never got the capabilities thing working at full strength, though the old super janky demo that convinced us it ought to work could be refined into a workable demo that can be shown to others. I’d rather just make claims about what people could do with better grokking and why safety people should be aiming to prepare to do stronger formal verification on better-grokked representations. These days I’m trying to get my productivity back up after burnout from $lastjob (significant progress! turns out the trick to remembering I can code is to be like, wait, all I have to do is realize I’m healthy and I’m healthy. funny how that works) and trying to spin up on my cellular automata coprotection research plan hunch, which I’ll hopefully be scooped on anyway, see above about burnout. I’ve been talking to uli and might write up the coprotection thing soon.
The interesting stuff is all in the consequences of what it makes possible. We should expect to be able to make margin proofs on larger physical systems than one might expect, given better grokking, and we should be thinking about what proofs we’d like on larger physical systems. I know everyone thinks I’m crazy for expecting this, but I suspect we might be able to formally verify margin of energy input before the cpu seL4 runs on misbehaves, for example. So the question is, what margin of error proof do we want in order to ask if a chunk of matter has retained its agency? And it seems like the bulk of safety folks are already on the right track to figure that out.