Right. Some intuition is necessary. But a lot of these choices are ad hoc, by which I mean they aren’t strongly constrained by the result you want from them.
For example, you have a linear penalty governed by this parameter lambda, but in principle it could have been any old function—the only strong constraint is that you want it to monotonically increase from a finite number to infinity. Now, maybe this is fine, or maybe not. But I basically don’t have much trust for meditation in this sort of case, and would rather see explicit constraints that rule out more of the available space.
I basically don’t have much trust for meditation in this sort of case
I’m not asking you to trust in anything, which is why I emphasized that I want people to think more carefully about these choices. I do not think eq. 5 is AGI-safe. I do not think you should put it in an AGI. Do I think there’s a chance it might work? Yes. But we don’t work with “chances”, so it’s not ready.
Anyways, if theorem 11 of the low-hanging fruit post is met, the tradeoff penalty works fine. I also formally explored the hard constraint case and discussed a few reasons why the tradeoff is preferable to the hard constraint. Therefore, I think that particular design choice is reasonably determined. Would you want to think about this more before actually running an AGI with that choice? Of course.
To your broader point, I think there may be another implicit frame difference here. I’m talking about the diff of the progress, considering questions like “are we making a lot of progress? What’s the marginal benefit of more research like this? Are we getting good philosophical returns from this line of work?”, to which I think the answer is yes.
On the other hand, you might be asking “are we there yet?”, and I think the answer to that is no. Notice how these answers don’t contradict each other.
From the first frame, being skeptical because each part of the equation isn’t fully determined seems like an unreasonable demand for rigor. I wrote this sequence because it seemed that my original AUP post was pedagogically bad (I was already thinking about concepts like “overfitting the AU landscape” back in August 2018) and so very few people understood what I was arguing.
I’d like to think that my interpretive labor has paid off: AUP isn’t a slapdash mixture of constraints which is too complicated to be obviously broken, it’s attempting to directly disincentive catastrophes based off of straightforward philosophical reasoning, relying on assumptions and conjectures which I’ve clearly stated. In many cases, I waited weeks so I could formalize my reasoning in the context of MDPs (e.g. why should you think of the AU landscape as a ‘dual’ to the world state? Because I proved it).
There’s always another spot where I could make my claims more rigorous, where I could gather just a bit more evidence. But at some point I have to actually put the posts up, and I think I’ve provided some pretty good evidence in this sequence.
From the second frame, being skeptical because each part of the equation isn’t fully determined is entirely appropriate and something I encourage.
I think you’re writing from something closer to the second frame, but I don’t know for sure. For my part, this sequence has been arguing from the first frame: “towards a new impact measure”, and that’s why I’ve been providing pushback.
Right. Some intuition is necessary. But a lot of these choices are ad hoc, by which I mean they aren’t strongly constrained by the result you want from them.
For example, you have a linear penalty governed by this parameter lambda, but in principle it could have been any old function—the only strong constraint is that you want it to monotonically increase from a finite number to infinity. Now, maybe this is fine, or maybe not. But I basically don’t have much trust for meditation in this sort of case, and would rather see explicit constraints that rule out more of the available space.
I’m not asking you to trust in anything, which is why I emphasized that I want people to think more carefully about these choices. I do not think eq. 5 is AGI-safe. I do not think you should put it in an AGI. Do I think there’s a chance it might work? Yes. But we don’t work with “chances”, so it’s not ready.
Anyways, if theorem 11 of the low-hanging fruit post is met, the tradeoff penalty works fine. I also formally explored the hard constraint case and discussed a few reasons why the tradeoff is preferable to the hard constraint. Therefore, I think that particular design choice is reasonably determined. Would you want to think about this more before actually running an AGI with that choice? Of course.
To your broader point, I think there may be another implicit frame difference here. I’m talking about the diff of the progress, considering questions like “are we making a lot of progress? What’s the marginal benefit of more research like this? Are we getting good philosophical returns from this line of work?”, to which I think the answer is yes.
On the other hand, you might be asking “are we there yet?”, and I think the answer to that is no. Notice how these answers don’t contradict each other.
From the first frame, being skeptical because each part of the equation isn’t fully determined seems like an unreasonable demand for rigor. I wrote this sequence because it seemed that my original AUP post was pedagogically bad (I was already thinking about concepts like “overfitting the AU landscape” back in August 2018) and so very few people understood what I was arguing.
I’d like to think that my interpretive labor has paid off: AUP isn’t a slapdash mixture of constraints which is too complicated to be obviously broken, it’s attempting to directly disincentive catastrophes based off of straightforward philosophical reasoning, relying on assumptions and conjectures which I’ve clearly stated. In many cases, I waited weeks so I could formalize my reasoning in the context of MDPs (e.g. why should you think of the AU landscape as a ‘dual’ to the world state? Because I proved it).
There’s always another spot where I could make my claims more rigorous, where I could gather just a bit more evidence. But at some point I have to actually put the posts up, and I think I’ve provided some pretty good evidence in this sequence.
From the second frame, being skeptical because each part of the equation isn’t fully determined is entirely appropriate and something I encourage.
I think you’re writing from something closer to the second frame, but I don’t know for sure. For my part, this sequence has been arguing from the first frame: “towards a new impact measure”, and that’s why I’ve been providing pushback.