Yeah, I agree with this. It’s certainly possible to see normal human passage through time as a process with probable attractors. I think the biggest differences are that HCH is a psychological “monoculture,” HCH has tiny bottlenecks through which to pass messages compared to the information I can pass to my future self, and there’s some presumption that the output will be “an answer” whereas I have no such demands on the brain-state I pass to tomorrow.
If we imagine actual human imitations I think all of these problems have fairly obvious solutions, but I think the problems are harder to solve if you want IDA approximations of HCH. I’m not totally sure what you meant by the confidence thresholds link—was it related to this?
The monoculture problem seems like it should increase the size (“size” meaning attraction basin, not measure of the equilibrium set), lifetime, and weirdness of attractors, while the restrictions and expectations on message-passing seem like they might shift the distribution away from “normal” human results.
But yeah, in theory we could use imitiation humans to do any research we could do ourselves. I think that gets into the relative difficulty of super-speed imitations of humans doing alignment research versus transformative AI, which I’m not really an expert in.
[EDIT: After thinking about this some more, I realized that malign AI leakage is a bigger problem than I thought when writing the parent comment, because the way I imagined it can be overcome doesn’t work that well.]
I think the biggest differences are that HCH is a psychological “monoculture,” HCH has tiny bottlenecks through which to pass messages compared to the information I can pass to my future self, and there’s some presumption that the output will be “an answer” whereas I have no such demands on the brain-state I pass to tomorrow.
I don’t think that last one is a real constraint. What counts as “an answer” is entirely a matter of interpretation by the participants in the HCH. For example, initially I can ask the question “what are the most useful thoughts about AI alignment I can come up with during 1,000,000 iterations?”. When I am tasked to answer the question “what are the most useful thoughts about AI alignment I can come up with during N iterations?” then
If N=1, I will just spend my allotted time thinking about AI alignment and write whatever I came up with in the end.
If N>1, I will ask “what are the most useful thoughts about AI alignment I can come up with during N−1 iterations?”. Then, I will study the answer and use the remaining time to improve on it to the best of my ability.
An iteration of 2 weeks might be too short to learn the previous results, but we can work in longer iterations. Certainly, having to learn the previous results from text carries overhead compared to just remembering myself developing them (and having developed some illegible intuitions in the process), but only that much overhead.
As to “monoculture”, we can do HCH with multiple people (either the AI learns to simulate the entire system of multiple people or we use some rigid interface e.g. posting on a forum). For example, we can imagine putting the entire AI X-safety community there. But, we certainly don’t want to put the entire world in there, since that way malign AI would probably leak into the system.
I think the problems are harder to solve if you want IDA approximations of HCH. I’m not totally sure what you meant by the confidence thresholds link—was it related to this?
Yes: it shows how to achieve reliable imitation (although for now in a theoretical model that’s not feasible to implement), and the same idea should be applicable to an imitation system like IDA (although it calls for its own theoretical analysis). Essentially, the AI queries a real person if and only if it cannot produce a reliable prediction using previous data (because there are several plausible mutually inconsistent hypotheses), and the frequency of queries vanishes over time.
Yeah, I agree with this. It’s certainly possible to see normal human passage through time as a process with probable attractors. I think the biggest differences are that HCH is a psychological “monoculture,” HCH has tiny bottlenecks through which to pass messages compared to the information I can pass to my future self, and there’s some presumption that the output will be “an answer” whereas I have no such demands on the brain-state I pass to tomorrow.
If we imagine actual human imitations I think all of these problems have fairly obvious solutions, but I think the problems are harder to solve if you want IDA approximations of HCH. I’m not totally sure what you meant by the confidence thresholds link—was it related to this?
The monoculture problem seems like it should increase the size (“size” meaning attraction basin, not measure of the equilibrium set), lifetime, and weirdness of attractors, while the restrictions and expectations on message-passing seem like they might shift the distribution away from “normal” human results.
But yeah, in theory we could use imitiation humans to do any research we could do ourselves. I think that gets into the relative difficulty of super-speed imitations of humans doing alignment research versus transformative AI, which I’m not really an expert in.
[EDIT: After thinking about this some more, I realized that malign AI leakage is a bigger problem than I thought when writing the parent comment, because the way I imagined it can be overcome doesn’t work that well.]
I don’t think that last one is a real constraint. What counts as “an answer” is entirely a matter of interpretation by the participants in the HCH. For example, initially I can ask the question “what are the most useful thoughts about AI alignment I can come up with during 1,000,000 iterations?”. When I am tasked to answer the question “what are the most useful thoughts about AI alignment I can come up with during N iterations?” then
If N=1, I will just spend my allotted time thinking about AI alignment and write whatever I came up with in the end.
If N>1, I will ask “what are the most useful thoughts about AI alignment I can come up with during N−1 iterations?”. Then, I will study the answer and use the remaining time to improve on it to the best of my ability.
An iteration of 2 weeks might be too short to learn the previous results, but we can work in longer iterations. Certainly, having to learn the previous results from text carries overhead compared to just remembering myself developing them (and having developed some illegible intuitions in the process), but only that much overhead.
As to “monoculture”, we can do HCH with multiple people (either the AI learns to simulate the entire system of multiple people or we use some rigid interface e.g. posting on a forum). For example, we can imagine putting the entire AI X-safety community there. But, we certainly don’t want to put the entire world in there, since that way malign AI would probably leak into the system.
Yes: it shows how to achieve reliable imitation (although for now in a theoretical model that’s not feasible to implement), and the same idea should be applicable to an imitation system like IDA (although it calls for its own theoretical analysis). Essentially, the AI queries a real person if and only if it cannot produce a reliable prediction using previous data (because there are several plausible mutually inconsistent hypotheses), and the frequency of queries vanishes over time.