If that counts as a decomposition, can’t we decompose ‘Output the correct y-coordinate’ the same way? ‘The asteroid is destroyed by a laser’ is evidence that the y-coordinate was correct iff X happened.
Right, I think I understand this; thanks for prodding on this issue.
You are right; “the asteroid is destroyed by a laser” is evidence the y coordinate was correct iff X happened. So that observation cannot be used as a natural category.
“The asteroid hits the Earth”, however, can. This is because of “the ratio P({Zj=aj} | X) / P({Zj=aj} | ¬X) is contained between 100 and 1/100″ and “let’s say aiming the laser correctly will reduce the odds of impact by a factor of 10”.
This is relevant because the AI is motivated to come up with a set of natural categories that allows it to successfully define reduced impact and also extend it. Therefore “The asteroid hits the Earth” is something it will choose to use, rather than “the laser hits the asteroid”. So that is likely not filtered.
Where does that leave your Z “Created a subsubagent with source code S”? It will be banned if the probability “swing” from nuclear war to non-nuclear war caused by S is less than 100.
So the real effect of all this work is to control the probability swing of higher impact events. A probability increase of 100 is not safe for things like nuclear war! It seems that we might be able to have a more complicated setup with a series of AIs that each increase the probability of some event by a small amount, but I’m not sure how to avoid them increasing the probability of Z as well.
EDIT: this is how I would handle it. The `allowed swing’ of impact is reduced, to 2, say. However the outcome—deflection of the asteroid—is specifically excluded from this effect. Thus we assume that, in a weeks time, someone will either “A: announce the asteroid missed us”, or not do so. In our equations, we allow A to be an event excluded from the reduced impact. eg P({Zj=aj} | X) / P({Zj=aj} | ¬X) is not constrained between 2 and 1⁄2, instead, P({Zj=aj} | X, A) / P({Zj=aj} | ¬X, A) is.
If that counts as a decomposition, can’t we decompose ‘Output the correct y-coordinate’ the same way? ‘The asteroid is destroyed by a laser’ is evidence that the y-coordinate was correct iff X happened.
Right, I think I understand this; thanks for prodding on this issue.
You are right; “the asteroid is destroyed by a laser” is evidence the y coordinate was correct iff X happened. So that observation cannot be used as a natural category.
“The asteroid hits the Earth”, however, can. This is because of “the ratio P({Zj=aj} | X) / P({Zj=aj} | ¬X) is contained between 100 and 1/100″ and “let’s say aiming the laser correctly will reduce the odds of impact by a factor of 10”.
This is relevant because the AI is motivated to come up with a set of natural categories that allows it to successfully define reduced impact and also extend it. Therefore “The asteroid hits the Earth” is something it will choose to use, rather than “the laser hits the asteroid”. So that is likely not filtered.
Where does that leave your Z “Created a subsubagent with source code S”? It will be banned if the probability “swing” from nuclear war to non-nuclear war caused by S is less than 100.
So the real effect of all this work is to control the probability swing of higher impact events. A probability increase of 100 is not safe for things like nuclear war! It seems that we might be able to have a more complicated setup with a series of AIs that each increase the probability of some event by a small amount, but I’m not sure how to avoid them increasing the probability of Z as well.
EDIT: this is how I would handle it. The `allowed swing’ of impact is reduced, to 2, say. However the outcome—deflection of the asteroid—is specifically excluded from this effect. Thus we assume that, in a weeks time, someone will either “A: announce the asteroid missed us”, or not do so. In our equations, we allow A to be an event excluded from the reduced impact. eg P({Zj=aj} | X) / P({Zj=aj} | ¬X) is not constrained between 2 and 1⁄2, instead, P({Zj=aj} | X, A) / P({Zj=aj} | ¬X, A) is.
Possibly… This might be the fatal flaw...