Having worked on some of the problems myself (e.g. decision theory), I think the underlying problems are just very hard. Why do you think they could have done “so much more, much more intently, and much sooner”?
The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn’t work (or isn’t working, or isn’t working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings—but the rest of the team either disagreed with me or didn’t understand me, and on my own I’m just not that good a thinker, and I didn’t find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting—i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.
My understanding of what happened (from reading this) is that you wanted to explore in a new direction very different from the then preferred approach of the AF team, but couldn’t convince them (or someone else) to join you. To me this doesn’t clearly have much to do with streetlighting, and my current guess is that it was probably reasonable of them to not be convinced. It was also perfectly reasonable of you to want to explore a different approach, but it seems unreasonable to claim without giving any details that it would have produced better results if only they had listened to you. (I mean you can claim this, but why should I believe you?)
If you disagree (and want to explain more), maybe you could either explain the analogy more fully (e.g., what corresponds to the streetlight, why should I believe that they overexplored the lighted area, what made you able to “see in the dark” to pick out a more promising search area or did you just generally want to explore the dark more) and/or try to convince me on the object level / inside view that your approach is or was more promising?
(Also perfectly fine to stop here if you want. I’m pretty curious on both the object and meta levels about your thoughts on AF, but you may not have wanted to get into such a deep discussion when you first joined this thread.)
Ok, so, there’s this thing about AGI killing everyone. And there’s this idea of avoiding that by making AGI that’s useful like an AGI but doesn’t kill everyone and does stuff we like. And you say you’re working on that, or want to work on that. And what you’re doing day to day is {some math thing, some programming thing, something about decision theory, …}. What is the connection between these things?
and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:
Well, I don’t know what to do to make aligned AI. But it seems like X ϵ {ontology, decision, preference function, NN latent space, logical uncertainty, reasoning under uncertainty, training procedures, negotiation, coordination, interoperability, planning, …} is somehow relevant.
And, I have a formalized version of some small aspect of X in which is mathematically interesting / philosophically intriguing / amenable to testing with a program, and which seems like it’s kinda related to X writ large. So what I’m going to do, is I’m going to tinker with this formalized version for a week/month/year, and then I’m going to zoom out and think about how this relates to X, and what I have and haven’t learned, and so on.
This is a good strategy because this is how all mathematical / scientific / technological progress is made: you start with stuff you know; you expand outwards by following veins of interest, tractability, and generality/power; you keep an eye roughly towards broader goals by selecting the broad region you’re in; and you build outward. What we see historically is that this process tends to lead us to think about the central / key / important / difficult / general problems—such problems show up everywhere, so we convergently will come to address them in due time. By mostly sticking, in our day-to-day work, to things that are relatively more concrete and tractable—though continually pushing and building toward difficult things—we make forward progress, sharpen our skills, and become familiar with the landscape of concepts and questions.
So I would summarize that position as endorsing streetlighting, in a very broad sense that encompasses most math / science / technology. And this position is largely correct! My claim is that
this is probably too slow for making Friendly AI, and
maybe one could go faster by trying to more directly cleave to the core philosophical problems.
(But note that, while that essay frames things as “a proposed solution”, the solution is barely anything—more like a few guesses at pieces of methodology—and the main point is the discussion of the problem; maybe a writing mistake.)
An underemphasized point that I should maybe elaborate more on: a main claim is that there’s untapped guidance to be gotten from our partial understanding—at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there’s a lot of progress to be made by having them talk to each other, so to speak.
[2.] maybe one could go faster by trying to more directly cleave to the core philosophical problems.
...
An underemphasized point that I should maybe elaborate more on: a main claim is that there’s untapped guidance to be gotten from our partial understanding—at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there’s a lot of progress to be made by having them talk to each other, so to speak.
OK but what would this even look like?\gen
Toss away anything amenable to testing and direct empirical analysis; it’s all too concrete and model-dependent.
Toss away mathsy proofsy approaches; they’re all too formalized and over-rigid and can only prove things from starting assumptions we haven’t got yet and maybe won’t think of in time.
Toss away basically all settled philosophy, too; if there were answers to be had there rather than a few passages which ask correct questions, the Vienna Circle would have solved alignment for us.
What’s left? And what causes it to hang together? And what causes it not to vanish up its own ungrounded self-reference?
Having worked on some of the problems myself (e.g. decision theory), I think the underlying problems are just very hard. Why do you think they could have done “so much more, much more intently, and much sooner”?
The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn’t work (or isn’t working, or isn’t working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings—but the rest of the team either disagreed with me or didn’t understand me, and on my own I’m just not that good a thinker, and I didn’t find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting—i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.
My understanding of what happened (from reading this) is that you wanted to explore in a new direction very different from the then preferred approach of the AF team, but couldn’t convince them (or someone else) to join you. To me this doesn’t clearly have much to do with streetlighting, and my current guess is that it was probably reasonable of them to not be convinced. It was also perfectly reasonable of you to want to explore a different approach, but it seems unreasonable to claim without giving any details that it would have produced better results if only they had listened to you. (I mean you can claim this, but why should I believe you?)
If you disagree (and want to explain more), maybe you could either explain the analogy more fully (e.g., what corresponds to the streetlight, why should I believe that they overexplored the lighted area, what made you able to “see in the dark” to pick out a more promising search area or did you just generally want to explore the dark more) and/or try to convince me on the object level / inside view that your approach is or was more promising?
(Also perfectly fine to stop here if you want. I’m pretty curious on both the object and meta levels about your thoughts on AF, but you may not have wanted to get into such a deep discussion when you first joined this thread.)
If you say to someone
and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:
So I would summarize that position as endorsing streetlighting, in a very broad sense that encompasses most math / science / technology. And this position is largely correct! My claim is that
this is probably too slow for making Friendly AI, and
maybe one could go faster by trying to more directly cleave to the core philosophical problems.
I discuss the problem more here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
(But note that, while that essay frames things as “a proposed solution”, the solution is barely anything—more like a few guesses at pieces of methodology—and the main point is the discussion of the problem; maybe a writing mistake.)
An underemphasized point that I should maybe elaborate more on: a main claim is that there’s untapped guidance to be gotten from our partial understanding—at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there’s a lot of progress to be made by having them talk to each other, so to speak.
OK but what would this even look like?\gen
Toss away anything amenable to testing and direct empirical analysis; it’s all too concrete and model-dependent.
Toss away mathsy proofsy approaches; they’re all too formalized and over-rigid and can only prove things from starting assumptions we haven’t got yet and maybe won’t think of in time.
Toss away basically all settled philosophy, too; if there were answers to be had there rather than a few passages which ask correct questions, the Vienna Circle would have solved alignment for us.
What’s left? And what causes it to hang together? And what causes it not to vanish up its own ungrounded self-reference?
From scratch but not from scratch. https://www.lesswrong.com/posts/noxHoo3XKkzPG6s7E/most-smart-and-skilled-people-are-outside-of-the-ea?commentId=DNvmP9BAR3eNPWGBa
https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html