To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age that I became interested in each topic in parentheses:
(10) Science—Science is cool!
(15) Philosophy of Science—The scientific method is cool! Oh look, there’s a whole field studying it called “philosophy of science”!
(20) Probability Theory—Bayesian subjective probability and the universal prior seem to constitute an elegant solution to the philosophy of science. Hmm, there are some curious probability puzzles involving things like indexical uncertainty, copying, forgetting… I and others make some progress on this but fully solving anthropic reasoning seems really hard. (Lots of people have worked on this for a while and have failed, at least according to my judgement.)
(25) Decision Theory—Where does probability theory come from anyway? Maybe I can find some clues that way? Well according to von Neumann and Morgenstern, it comes from decision theory. And hey, maybe it will be really important that we get decision theory right for AI? I and others make some progress but fully solving decision theory turns out to be pretty hard too. (A number of people have worked on this for a while and haven’t succeeded yet.)
(45) Meta Questions about Metaphilosophy—Not sure how hard solving metaphilosophy really is, but I’m not making much progress on it by myself. Meta questions once again start to appear in my mind:
Why is there virtuallynobodyelse interested in metaphilosophy or ensuring AI philosophical competence (or that of future civilization as a whole), even as we get ever closer to AGI, and other areas of AI safety start attracting more money and talent?
Tractability may be a concern but shouldn’t more people still be talking about these problems if only to raise the alarm (about an additional reason that the AI transition may go badly)? (I’ve listened to all the recent podcasts on AI risk that I could find, and nobody brought it up even once.)
How can I better recruit attention and resources to this topic? For example, should I draw on my crypto-related fame, or start a prize or grant program with my own money? I’m currently not inclined to do either, out of inertia, unfamiliarity, uncertainty of getting any return, fear of drawing too much attention from people who don’t have the highest caliber of thinking, and signaling wrong things (having to promote ideas with one’s own money instead of attracting attention based on their merits). But I’m open to having my mind changed if anyone has good arguments about this.
What does it imply that so few people are working on this at such a late stage? For example, what are the implications for the outcome of the human-AI transition, and on the distribution of philosophical competence (and hence the distribution of values, decision theories, and other philosophical views) among civilizations in the universe/multiverse?
At each stage of this journey, I took what seemed to be the obvious next step (often up a meta ladder), but in retrospect each step left behind something like 90-99% of fellow travelers. From my current position, it looks like “all roads lead to metaphilosophy” (i.e., one would end up here starting with an interest in any nontrivial problem that incentivizes asking meta questions) and yet there’s almost nobody here with me. What gives?
As for the AI safety path (as opposed to pure intellectual curiosity) that also leads here, I guess I do have more of a clue what’s going on. I’ll describe the positions of 4 people I know. Most of this is from private conversations so I won’t give their names.
Person A has a specific model of the AI transition that they’re pretty confident in, where the first AGI is likely to develop a big lead and if it’s aligned, can quickly achieve human uploading then defer to the uploads for philosophical questions.
Person B thinks that ensuring AI philosophical competence won’t be very hard. They have a specific (unpublished) idea that they are pretty sure will work. They’re just too busy to publish/discuss the idea.
Person C will at least think about metaphilosophy in the back of their mind (as they spend most of their time working on other things related to AI safety).
Person D thinks it is important and too neglected but they personally have a comparative advantage in solving intent alignment.
To me, this paints a bigger picture that’s pretty far from “humanity has got this handled.” If anyone has any ideas how to change this, or answers to any of my other unsolved problems in this post, or an interest in working on them, I’d love to hear from you.
Meta Questions about Metaphilosophy
To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age that I became interested in each topic in parentheses:
(10) Science—Science is cool!
(15) Philosophy of Science—The scientific method is cool! Oh look, there’s a whole field studying it called “philosophy of science”!
(20) Probability Theory—Bayesian subjective probability and the universal prior seem to constitute an elegant solution to the philosophy of science. Hmm, there are some curious probability puzzles involving things like indexical uncertainty, copying, forgetting… I and others make some progress on this but fully solving anthropic reasoning seems really hard. (Lots of people have worked on this for a while and have failed, at least according to my judgement.)
(25) Decision Theory—Where does probability theory come from anyway? Maybe I can find some clues that way? Well according to von Neumann and Morgenstern, it comes from decision theory. And hey, maybe it will be really important that we get decision theory right for AI? I and others make some progress but fully solving decision theory turns out to be pretty hard too. (A number of people have worked on this for a while and haven’t succeeded yet.)
(35) Metaphilosophy—Where does decision theory come from? It seems to come from philosophers trying to do philosophy. What is that about? Plus, maybe it will be really important that the AIs we build will be philosophically competent?
(45) Meta Questions about Metaphilosophy—Not sure how hard solving metaphilosophy really is, but I’m not making much progress on it by myself. Meta questions once again start to appear in my mind:
Why is there virtually nobody else interested in metaphilosophy or ensuring AI philosophical competence (or that of future civilization as a whole), even as we get ever closer to AGI, and other areas of AI safety start attracting more money and talent?
Tractability may be a concern but shouldn’t more people still be talking about these problems if only to raise the alarm (about an additional reason that the AI transition may go badly)? (I’ve listened to all the recent podcasts on AI risk that I could find, and nobody brought it up even once.)
How can I better recruit attention and resources to this topic? For example, should I draw on my crypto-related fame, or start a prize or grant program with my own money? I’m currently not inclined to do either, out of inertia, unfamiliarity, uncertainty of getting any return, fear of drawing too much attention from people who don’t have the highest caliber of thinking, and signaling wrong things (having to promote ideas with one’s own money instead of attracting attention based on their merits). But I’m open to having my mind changed if anyone has good arguments about this.
What does it imply that so few people are working on this at such a late stage? For example, what are the implications for the outcome of the human-AI transition, and on the distribution of philosophical competence (and hence the distribution of values, decision theories, and other philosophical views) among civilizations in the universe/multiverse?
At each stage of this journey, I took what seemed to be the obvious next step (often up a meta ladder), but in retrospect each step left behind something like 90-99% of fellow travelers. From my current position, it looks like “all roads lead to metaphilosophy” (i.e., one would end up here starting with an interest in any nontrivial problem that incentivizes asking meta questions) and yet there’s almost nobody here with me. What gives?
As for the AI safety path (as opposed to pure intellectual curiosity) that also leads here, I guess I do have more of a clue what’s going on. I’ll describe the positions of 4 people I know. Most of this is from private conversations so I won’t give their names.
Person A has a specific model of the AI transition that they’re pretty confident in, where the first AGI is likely to develop a big lead and if it’s aligned, can quickly achieve human uploading then defer to the uploads for philosophical questions.
Person B thinks that ensuring AI philosophical competence won’t be very hard. They have a specific (unpublished) idea that they are pretty sure will work. They’re just too busy to publish/discuss the idea.
Person C will at least think about metaphilosophy in the back of their mind (as they spend most of their time working on other things related to AI safety).
Person D thinks it is important and too neglected but they personally have a comparative advantage in solving intent alignment.
To me, this paints a bigger picture that’s pretty far from “humanity has got this handled.” If anyone has any ideas how to change this, or answers to any of my other unsolved problems in this post, or an interest in working on them, I’d love to hear from you.