Looking back at Flint’s work, I don’t agree with this summary.
Ah, sorry, I wasn’t intending for that to be a summary. I found Flint’s framework very insightful, but after reading it I sort of just melded it into my own overall beliefs and understanding around optimization. I don’t think he intended it to be a coherent or finished framework on its own, so I don’t generally try to think “what does Flint’s framework say about X?”. I think its main influence on me was the whole idea of using dynamical systems and phase space as the basis for optimization. So for example;
In any case, I agree that Flint’s work also eliminates the need for an unnatural baseline in which we have to remove the agent.
I would say that working in the framework of dynamical systems is what lets one get a natural baseline against which to measure optimization, by comparing a given trajectory with all possible trajectories.
I think I could have some more response/commentary about each of your bullet points, but there’s a background overarching thing that may be more useful to prod at. I have a clear (-feeling-to-me) distinction between “optimization” and “agent”, which doesn’t seem to be how you’re using the words. The dynamical systems + Yudkowsky measure perspective is a great start on capturing the optimization concept, but it is agnostic about (my version of) the agent concept (except insofar as agents are a type of optimizer). It feels to me like the idea of endorsement you’re developing here is cool and useful and is… related to optimization, but isn’t the basis of optimization. So I agree that e.g. “endorsement” is closer to alignment, but also I don’t think that “optimization” is supposed to be all that close to alignment; I’d reserve that for “agent”. I think we’ll need a few levels of formalization in agent foundations, and you’re working toward a different level than those, and so these ideas aren’t in conflict.
Breaking that down just a bit more; let’s say that “alignment” refers to aligning the intentional goals of agents. I’d say that “optimization” is a more general phenomenon where some types of systems tend to move their state up an ordering; but that doesn’t mean that it’s “intentional”, nor that that goal is cleanly encoded somewhere inside the system. So while you could say that two optimizing systems “are more aligned” if they move up similar state orderings, it would be awkward to talk about aligning them.
(My notion of) optimization has its own version of the thing you’re calling “Vingean”, which is that if I believe a process optimizes along a certain state ordering, but I have no beliefs about how it works on the inside, then I can still at least predict that the state will go up the ordering. I can predict that the car will arrive at the airport even though I don’t know the turns. But this has nothing to do with the (optimization) process having beliefs or doing reasoning of any kind (which I think of as agent properties). For example I believe that there exists an optimization process such that mountains get worn down, and so I will predict it to happen, even though I know very little about the chemistry of erosion or rocks. And this is kinda like “endorsement”, but it’s not that the mountain has probability assignments or anything.
In fact I think it’s just a version of what makes something a good abstraction; an abstraction is a compact model that allows you to make accurate predictions about outcomes without having to predict all intermediate steps. And all abstractions also have the property that if you have enough compute/etc. then you can just directly calculate the outcome based on lower-level physics, and don’t need the abstraction to predict the outcome accurately.
I think that was a longer-winded way to say that I don’t think your concepts in this post are replacements for the Yudkowsky/Flint optimization ideas; instead it sounds like you’re saying “Assume the optimization process is of the kind that has beliefs and takes actions. Then we can define ‘endorsement’ as follows; …”
Ah, sorry, I wasn’t intending for that to be a summary. I found Flint’s framework very insightful, but after reading it I sort of just melded it into my own overall beliefs and understanding around optimization. I don’t think he intended it to be a coherent or finished framework on its own, so I don’t generally try to think “what does Flint’s framework say about X?”. I think its main influence on me was the whole idea of using dynamical systems and phase space as the basis for optimization. So for example;
I would say that working in the framework of dynamical systems is what lets one get a natural baseline against which to measure optimization, by comparing a given trajectory with all possible trajectories.
I think I could have some more response/commentary about each of your bullet points, but there’s a background overarching thing that may be more useful to prod at. I have a clear (-feeling-to-me) distinction between “optimization” and “agent”, which doesn’t seem to be how you’re using the words. The dynamical systems + Yudkowsky measure perspective is a great start on capturing the optimization concept, but it is agnostic about (my version of) the agent concept (except insofar as agents are a type of optimizer). It feels to me like the idea of endorsement you’re developing here is cool and useful and is… related to optimization, but isn’t the basis of optimization. So I agree that e.g. “endorsement” is closer to alignment, but also I don’t think that “optimization” is supposed to be all that close to alignment; I’d reserve that for “agent”. I think we’ll need a few levels of formalization in agent foundations, and you’re working toward a different level than those, and so these ideas aren’t in conflict.
Breaking that down just a bit more; let’s say that “alignment” refers to aligning the intentional goals of agents. I’d say that “optimization” is a more general phenomenon where some types of systems tend to move their state up an ordering; but that doesn’t mean that it’s “intentional”, nor that that goal is cleanly encoded somewhere inside the system. So while you could say that two optimizing systems “are more aligned” if they move up similar state orderings, it would be awkward to talk about aligning them.
(My notion of) optimization has its own version of the thing you’re calling “Vingean”, which is that if I believe a process optimizes along a certain state ordering, but I have no beliefs about how it works on the inside, then I can still at least predict that the state will go up the ordering. I can predict that the car will arrive at the airport even though I don’t know the turns. But this has nothing to do with the (optimization) process having beliefs or doing reasoning of any kind (which I think of as agent properties). For example I believe that there exists an optimization process such that mountains get worn down, and so I will predict it to happen, even though I know very little about the chemistry of erosion or rocks. And this is kinda like “endorsement”, but it’s not that the mountain has probability assignments or anything.
In fact I think it’s just a version of what makes something a good abstraction; an abstraction is a compact model that allows you to make accurate predictions about outcomes without having to predict all intermediate steps. And all abstractions also have the property that if you have enough compute/etc. then you can just directly calculate the outcome based on lower-level physics, and don’t need the abstraction to predict the outcome accurately.
I think that was a longer-winded way to say that I don’t think your concepts in this post are replacements for the Yudkowsky/Flint optimization ideas; instead it sounds like you’re saying “Assume the optimization process is of the kind that has beliefs and takes actions. Then we can define ‘endorsement’ as follows; …”