Ok, so at what point does Clippy stop simulating the debug version of Clippy ? It does, after all, want to make the computation of its values more efficient. For example, consider a trivial scenario where one of its values basically said, “reject any action if it satisfies both A and not-A”. This is a logically inconsistent value that some programmer accidentally left in Clippy’s original source code. Would Clippy ever get around to removing it ? After all, Clippy knows that it’s applying that test to every action, so removing it should result in a decent performance boost.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
Why do you see the proposed experiment this way ?
Speaking more generally, how do you decide which avenues of research are worth pursuing ? You could easily answer, “whichever avenues would increase my efficiency of achieving my terminal goals”, but how do you know which avenues would actually do that ? For example, if you didn’t know anything about electricity or magnetism or the nature of light, how would your research-choosing algorithm ensure that you’d eventually stumble upon radio waves, which, as we know in hindsight, are hugely useful ?
When we didn’t know what things like radio waves or x-rays were, we didn’t know that they would be useful, but we could see that there appeared to be some sort of existing phenomena that we didn’t know how to model, so we examined them until we knew how to model them. It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at, which could be turned to useful ends. The original observations of radio waves and x-rays came from our experiments with other known phenomena.
What you’re suggesting sounds more like experimenting completely blindly; you’re committing resources to research, not just not knowing that it will bear valuable fruit, but not having any indication that it’s going to shed light on any existing phenomenon at all. That’s why I think it’s less like investigating invisible rays than like building a protein collider; we didn’t try studying invisible rays until we had a good indication that there was an invisible something to be studied.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at...
Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
That seems plausible.
Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
You say,
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
It would also be critical for Clippy to observe that removing that value would not result in more expected actions taken that satisfy both A and not-A; this being one of Clippy’s values at the time of modification.
Right, I misread that before. If its programming says to reject actions that says A and not-A, but this isn’t one of the standards by which it judges value, it would presumably reject it. If that is one of the standards by which it measures value, then it would depend on how that value measured against its value of paperclips and the extent to which they were in conflict.
Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
When we didn’t know what things like radio waves or x-rays were, we didn’t know that they would be useful, but we could see that there appeared to be some sort of existing phenomena that we didn’t know how to model, so we examined them until we knew how to model them. It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at, which could be turned to useful ends. The original observations of radio waves and x-rays came from our experiments with other known phenomena.
What you’re suggesting sounds more like experimenting completely blindly; you’re committing resources to research, not just not knowing that it will bear valuable fruit, but not having any indication that it’s going to shed light on any existing phenomenon at all. That’s why I think it’s less like investigating invisible rays than like building a protein collider; we didn’t try studying invisible rays until we had a good indication that there was an invisible something to be studied.
Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
That seems plausible.
I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
You say,
But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.
It would also be critical for Clippy to observe that removing that value would not result in more expected actions taken that satisfy both A and not-A; this being one of Clippy’s values at the time of modification.
Right, I misread that before. If its programming says to reject actions that says A and not-A, but this isn’t one of the standards by which it judges value, it would presumably reject it. If that is one of the standards by which it measures value, then it would depend on how that value measured against its value of paperclips and the extent to which they were in conflict.