This proposal is a minor variation on the HCH type ideas. The main differences seem to be
Only a single training example needed through use of hypotheticals.
The human can propose something else on the next step.
The human can’t query copies of their simulation, at least not at first. There are probably all sorts of ways the human can bootstrap, they are writing arbitrary maths expressions.
This leads to a selection of problems largely similar to the HCH problems.
We need good inner alignment. (And with this, we also need to understand hypotheticals).
High fidelity, we don’t want a game of Chinese whispers.
The process could never converge, it could get stuck in an endless loop.
Especially if passing the message on to the next cindy is one button press, what happens when one of them stumbles on a really memmetic idea, would the question end up filled with “repost or a ghost will haunt you” nonsense that plagues some websites. You are applying strong memetic selection pressure, and might get really funny jokes instead of AI ideas.
Having the whole alignment community for 6 months be part of the question answerer is more likely to work than one person for a few hours, but that amplifies other problems.
This method also has the problem of amplified failure probability. Suppose somewhere down the line, millions of iterations in, cindy goes outside for a walk, and gets hit by a truck. Virtual cindy doesn’t return to continue the next layer of the recursion. What then? (Possibly some code just adds “attempt 2” at the top and tries again.)
Ok, so another million layers in, cindy drops a coffee cup on the keyboard, accidentally typing some rubbish. This gets interpreted by the AI as a mathematical command, and the AI goes on to maximize ???
Chaos theory. Someone else develops a paperclip maximizer many iterations in, and the paperclip maximizer realizes it’s in a simulation, hacks into the answer channel and returns “make as many paperclips as possible” to the AI.
And then there is the standard mindcrime concern. Where are all these virtual cindies going once we are done with them? We can probably just tell the AI in english that our utility function is likely to dislike deleting virtual humans. So all the virtual humans get saved on disk, and then can live in the utopia. Hey, we need loads of people to fill up the dyson sphere anyway.
I am not confident that your “make it complicated and personal data” approach at the root really stops all the aliens doing weird acausal stuff. The multiverse is big. Somewhere out there there is a cindy producing any bitstream that looks like this personal data, and somewhere out there are aliens faking the whole scenario for every possible stream of similar data. You probably need the internal counterfactual design to be resistant to acausal tampering.
A difference from HCH (not the only one): As far as we can tell, an AI can’t Goodhart the past (which is why the interval is in effect). I think something similar applies to the adding-random-noise part.
Only a single training example needed through use of hypotheticals.
(to be clear, the question and answer serve less as “training data” meant to represent the user, but as “IDs” or “coordinates” menat to locate the user in past-lightcone.)
We need good inner alignment. (And with this, we also need to understand hypotheticals).
this is true, though i think we might not need a super complex framework for hypotheticals. i have some simple math ideas that i explore a bit here, and about which i might write a bunch more.
for failure modes like the user getting hit by a truck or spilling coffee, we can do things such as at each step asking not 1 cindy the question, but asking 1000 cindy’s 1000 slight variations on the question, and then maybe have some kind of convolutional network to curate their answers (such as ignoring garbled or missing output) and pass them to the next step, without ever relying on a small number of cindy’s except at the very start of this process.
it is true that weird memes could take over the graph of cindy’s; i don’t have an answer to that apart that it seems sufficiently not likely to me that i still think this plan has promise.
Chaos theory. Someone else develops a paperclip maximizer many iterations in, and the paperclip maximizer realizes it’s in a simulation, hacks into the answer channel and returns “make as many paperclips as possible” to the AI.
hmm. that’s possible. i guess i have to hope this never happens on the question-interval, on any simulation day. alternatively, maybe the mutually-checking graph of a 1000 cindy’s can help with this? (but probly not; clippy can just hack the cindy’s).
So all the virtual humans get saved on disk, and then can live in the utopia. Hey, we need loads of people to fill up the dyson sphere anyway.
yup. or, if the QACI user is me, i’m probly also just fine with those local deaths; not a big deal compared to an increased chance of saving the world. alternatively, instead of being saved on disk, they can also just be recomputed later since the whole process is deterministic.
I am not confident that your “make it complicated and personal data” approach at the root really stops all the aliens doing weird acausal stuff.
yup, i’m not confident either. i think there could be other schemes, possibly involving cryptography in some ways, to entangle the answer with a unique randomly generated signature key or something like that.
This proposal is a minor variation on the HCH type ideas. The main differences seem to be
Only a single training example needed through use of hypotheticals.
The human can propose something else on the next step.
The human can’t query copies of their simulation, at least not at first. There are probably all sorts of ways the human can bootstrap, they are writing arbitrary maths expressions.
This leads to a selection of problems largely similar to the HCH problems.
We need good inner alignment. (And with this, we also need to understand hypotheticals).
High fidelity, we don’t want a game of Chinese whispers.
The process could never converge, it could get stuck in an endless loop.
Especially if passing the message on to the next cindy is one button press, what happens when one of them stumbles on a really memmetic idea, would the question end up filled with “repost or a ghost will haunt you” nonsense that plagues some websites. You are applying strong memetic selection pressure, and might get really funny jokes instead of AI ideas.
Having the whole alignment community for 6 months be part of the question answerer is more likely to work than one person for a few hours, but that amplifies other problems.
This method also has the problem of amplified failure probability. Suppose somewhere down the line, millions of iterations in, cindy goes outside for a walk, and gets hit by a truck. Virtual cindy doesn’t return to continue the next layer of the recursion. What then? (Possibly some code just adds “attempt 2” at the top and tries again.)
Ok, so another million layers in, cindy drops a coffee cup on the keyboard, accidentally typing some rubbish. This gets interpreted by the AI as a mathematical command, and the AI goes on to maximize ???
Chaos theory. Someone else develops a paperclip maximizer many iterations in, and the paperclip maximizer realizes it’s in a simulation, hacks into the answer channel and returns “make as many paperclips as possible” to the AI.
And then there is the standard mindcrime concern. Where are all these virtual cindies going once we are done with them? We can probably just tell the AI in english that our utility function is likely to dislike deleting virtual humans. So all the virtual humans get saved on disk, and then can live in the utopia. Hey, we need loads of people to fill up the dyson sphere anyway.
I am not confident that your “make it complicated and personal data” approach at the root really stops all the aliens doing weird acausal stuff. The multiverse is big. Somewhere out there there is a cindy producing any bitstream that looks like this personal data, and somewhere out there are aliens faking the whole scenario for every possible stream of similar data. You probably need the internal counterfactual design to be resistant to acausal tampering.
A difference from HCH (not the only one): As far as we can tell, an AI can’t Goodhart the past (which is why the interval is in effect). I think something similar applies to the adding-random-noise part.
(to be clear, the question and answer serve less as “training data” meant to represent the user, but as “IDs” or “coordinates” menat to locate the user in past-lightcone.)
this is true, though i think we might not need a super complex framework for hypotheticals. i have some simple math ideas that i explore a bit here, and about which i might write a bunch more.
for failure modes like the user getting hit by a truck or spilling coffee, we can do things such as at each step asking not 1 cindy the question, but asking 1000 cindy’s 1000 slight variations on the question, and then maybe have some kind of convolutional network to curate their answers (such as ignoring garbled or missing output) and pass them to the next step, without ever relying on a small number of cindy’s except at the very start of this process.
it is true that weird memes could take over the graph of cindy’s; i don’t have an answer to that apart that it seems sufficiently not likely to me that i still think this plan has promise.
hmm. that’s possible. i guess i have to hope this never happens on the question-interval, on any simulation day. alternatively, maybe the mutually-checking graph of a 1000 cindy’s can help with this? (but probly not; clippy can just hack the cindy’s).
yup. or, if the QACI user is me, i’m probly also just fine with those local deaths; not a big deal compared to an increased chance of saving the world. alternatively, instead of being saved on disk, they can also just be recomputed later since the whole process is deterministic.
yup, i’m not confident either. i think there could be other schemes, possibly involving cryptography in some ways, to entangle the answer with a unique randomly generated signature key or something like that.
Strong upvote. Would like to see OP’s response to this.