Primarily interested in agent foundations and AI macrostrategy.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
Primarily interested in agent foundations and AI macrostrategy.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
glitch tokens are my favorite example
I directionally agree with the core argument of this post.
The elephant(s) in the room according to me:
What is an algorithm? (inb4 a physical process that can be interpreted/modeled as implementing computation)
How do you distinguish (hopefully, in a principled way) between (a) an algorithm changing; (b) you being confused about what algorithm the thing is actually running and in reality being more nuanced so that what “naively” seems like a change of the algorithm is “actually” a reparametrization of the algorithm?
I haven’t read the examples in this post super carefully, so perhaps you discuss this somewhere in the examples (though I don’t think so because the examples don’t seem to me like the place to include such discussion).
Thanks for the post! I expected some mumbo jumbo but it turned out to be an interesting intuition pump.
Based on my attending Oliver’s talk, this may be relevant/useful:
I too have reservations about points 1 and 3 but not providing sufficient references or justifications doesn’t imply they’re not on SL1.
mentioned in the FAQ
(I see what podcasts you listen to.)
My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.
Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.
Also, I’m curious what it is that you consider(ed) AI safety progress/innovation. Can you give a few representative examples?
the approaches that have been attracting the most attention and funding are dead ends
I’d love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).
I propose “token surprise” (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.
It’s better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don’t think there’s anything hypocritical about that.
Similarly, hedging is not hypocrisy.
Do you think [playing in a rat race because it’s the most locally optimal for an individual thing to do while at the same advocating for abolishing the rat race] is an example of reformative hypocrisy?
Or even more broadly, defecting in a prisoner’s dilemma while exposing an interface that would allow cooperation with other like-minded players?
I’ve had this concept for many years and it hasn’t occurred to me to give it a name (How Stupid Not To Have Thought Of That) but if I tried to give it a name, I definitely wouldn’t call it a kind of hypocrisy.
It’s not clear to me how this results from “excess resources for no reasons”. I guess the “for no reasons” part is crucial here?
I meant this strawberry problem.
Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but in the next 5.
I haven’t listened to the pod yet but I guess you meant “but not in the next 5”.
FWIW Oliver’s presentation of (some fragment of) his work at ILIAD was my favorite of all the talks I attended at the conference.
https://gwern.net/doc/existential-risk/2011-05-10-givewell-holdenkarnofskyjaantallinn.doc