First, I’m really happy to see this post. So far I’ve seen very little about HCH outside of Paul’s writings (and work by Ought). I think it may be one of the top well-regarded AI-safety proposals, and this is basically the first serious critique I’ve seen so far.
I’ve been working with Ought for a few months now, and still feel like I understand the theory quite a bit less than Paul or Andreas. That said, here are some thoughts:
1. Your criticism that HCH doesn’t at all guarantee alignment seems quite fair to me. The fact that the system is put together by humans definitely does not guarantee alignment. It may help, but the amount that it does seems quite uncertain to me.
2. I think that details and discussion around HCH should become much more clear as actual engineering implementations make more progress. Right now it’s all quite theoretical, and I’d expect to get some results fairly soon (the next few months, maybe a year or two).
3. I get the sense that you treat HCH a bit like a black box in the above comments. One thing I really like about HCH is that it divides work into many human-understandable tasks. If a system actually started getting powerful, this could allow us to be able to understand it in a pretty sophisticated way. We could basically inspect it and realize how problems happen. (this gets messy when things get distilled, but even things could be relatively doable).
I would hope that if HCH does gain traction, there would be a large study on exactly what large task networks look like. The “moral problems” could be isolated to very specific questions, which may be able to get particularly thorough testing & analysis.
4. “Indeed, if it succeeds, then we could use that to program a FAI (given a few solutions to other problems), and that would, in a sense, count Paul’s approach succeeding.
But that is not how the approach is generally presented. ”
I think that’s one of the main ideas that Andreas has for it with Ought. I definitely would expect that we could use the system to help understand and implement safety. This is likely a crucial element.
It’s a bit awkward to me that intelligence amplification seems to have two very different benefits:
1. Increases human reasoning abilities, hopefully in a direction towards AI-safety.
2. Actually exist as an implementation for a Safe AI.
This of course hints that there’s another class of interesting projects to be worked on; ones that satisfy goal 1 well without attempting goal 2. I think this is an area that could probably use quite a bit more thought.
First, I’m really happy to see this post. So far I’ve seen very little about HCH outside of Paul’s writings (and work by Ought). I think it may be one of the top well-regarded AI-safety proposals, and this is basically the first serious critique I’ve seen so far.
I’ve been working with Ought for a few months now, and still feel like I understand the theory quite a bit less than Paul or Andreas. That said, here are some thoughts:
1. Your criticism that HCH doesn’t at all guarantee alignment seems quite fair to me. The fact that the system is put together by humans definitely does not guarantee alignment. It may help, but the amount that it does seems quite uncertain to me.
2. I think that details and discussion around HCH should become much more clear as actual engineering implementations make more progress. Right now it’s all quite theoretical, and I’d expect to get some results fairly soon (the next few months, maybe a year or two).
3. I get the sense that you treat HCH a bit like a black box in the above comments. One thing I really like about HCH is that it divides work into many human-understandable tasks. If a system actually started getting powerful, this could allow us to be able to understand it in a pretty sophisticated way. We could basically inspect it and realize how problems happen. (this gets messy when things get distilled, but even things could be relatively doable).
I would hope that if HCH does gain traction, there would be a large study on exactly what large task networks look like. The “moral problems” could be isolated to very specific questions, which may be able to get particularly thorough testing & analysis.
4. “Indeed, if it succeeds, then we could use that to program a FAI (given a few solutions to other problems), and that would, in a sense, count Paul’s approach succeeding.
But that is not how the approach is generally presented. ”
I think that’s one of the main ideas that Andreas has for it with Ought. I definitely would expect that we could use the system to help understand and implement safety. This is likely a crucial element.
It’s a bit awkward to me that intelligence amplification seems to have two very different benefits:
1. Increases human reasoning abilities, hopefully in a direction towards AI-safety.
2. Actually exist as an implementation for a Safe AI.
This of course hints that there’s another class of interesting projects to be worked on; ones that satisfy goal 1 well without attempting goal 2. I think this is an area that could probably use quite a bit more thought.
[Edited after some feedback]