First, I want to flag that I really appreciate how you’re making these delta clear and (fairly) simple.
I like this, though I feel like there’s probably a great deal more clarity/precision to be had here (as is often the case).
Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse and would unambiguously have been worth-to-me the cost to make better.
I’m not sure what “bad” means exactly. Do you basically mean, “if I were to spend resources R evaluating this object, I could identify some ways for it to be significantly improved?” If so, I assume we’d all agree that this is true for some amount R, the key question is what that amount is.
I also would flag that you draw attention to the issue with air conditioners. But for the issue of personal items, I’d argue that when I learn more about popular items, most of what I learn are positive things I didn’t realize. Like with Chesterton’s fence—when I get many well-reviewed or popular items, my impression is generally that there were many clever ideas or truths behind those items that I don’t at all have time to understand, let alone invent myself. A related example is cultural knowledge—a la The Secret of Our Success.
When I try out software problems, my first few attempts don’t go well for reasons I didn’t predict. The very fact that “it works in tests, and it didn’t require doing anything crazy” is a significant update.
Sure, with enough resources R, one could very likely make significant improvements to any item in question—but as a purchaser, I only have resources r << R to make my decisions. My goal is to buy items to make my life better, it’s fine that there are potential other gains to be had by huge R values.
I feel like this isn’t very well formalized. I think I agree with this comment on that post. I feel like you’re saying, “It’s easier to generate a simple thing than verify all possible things”, but Paul and co are saying more like, “It’s easier to verify/evaluate a thing of complexity C than generate a think of complexity C, in many important conditions”, or, “There are ways of delegating many tasks where the evaluation work required would be less than that of doing the work yourself, in order to get a result of a certain level of quality.”
I think that Paul’s take (as I understand it) seems like a fundamental aspect about the working human world. Humans generally get huge returns from not inventing the wheel all the time, and deferring to others a great deal. This is much of what makes civilization possible. It’s not perfect, but it’s much better than what individual humans could do by themselves.
> Under my current models, I expect that, shortly after AIs are able to autonomously develop, analyze and code numerical algorithms better than humans, there’s going to be some pretty big (like, multiple OOMs) progress in AI algorithmic efficiency (even ignoring a likely shift in ML/AI paradigm once AIs start doing the AI research)
I appreciate the precise prediction, but don’t see how it exactly follows. This seems more like a question of “how much better will early AIs be compared to current humans”, than one deeply about verification/generation. Also, I’d flag that in many worlds, I’d expect that pre-AGI AIs could do a lot of this code improvement—or they already have—so it’s not clear exactly how big a leap the “autonomously” is doing here.
---
I feel like there are probably several wins to be had by better formalizing these concepts more. They seem fairly cruxy/high-delta in the debates on this topic.
I would naively approach some of this with some simple expected value/accuracy lens. There are many assistants (including AIs) that I’d expect would improve the expected accuracy on key decisions, like knowing which AI systems to trust. In theory, it’s possible to show a bunch of situations where delegation would be EV-positive.
That said, a separate observer could of course claim that one using the process above would be so wrong as to be committing self-harm. Like, “I think that when you would try to use delegation, your estimates of impact are predictably wrong in ways that would lead to you losing.” But this seems like mainly a question about “are humans going to be predictably overconfident in a certain domain, as seen by other specific humans”.
Thinking about this more, it seems like there are some key background assumptions that I’m missing.
Some assumptions that I often hear get presenting on this topic are things like: 1. “A misaligned AI will explicitly try to give us hard-to-find vulnerabilities, so verifying arbitrary statements from these AIs will be incredibly hard.” 2. “We need to generally have incredibly high assurances to build powerful systems that don’t kill us”.
My obvious counter-arguments would be: 1. Sure, but smart agents would have a reasonable prior that agents would be misaligned, and also, they would give these agents tasks that would be particularly easy to verify. Any action actually taken by a smart overseer, using information provided by another agent with a chance of being misaligned, M (known by the smart overseer), should be EV-positive in value. With some creativity, there’s likely a bunch of ways of structuring things (using systems likely not to be misaligned, using more verifiable questions), where many resulting actions will likely be heavily EV-positive.
2. “Again, my argument in (1). Second, we can build these systems gradually, and with a lot of help from people/AIs that won’t require such high assurances.” (This is similar to the HCH / oversight arguments)
I’m not sure what “bad” means exactly. Do you basically mean, “if I were to spend resources R evaluating this object, I could identify some ways for it to be significantly improved?” If so, I assume we’d all agree that this is true for some amount R, the key question is what that amount is.
I think an interesting version of this is “if I were to spend resource R evaluating this object, I could identify some ways for it to be significantly improved (even when factoring in additinoal cost) that the productino team probably already knew about”
First, I want to flag that I really appreciate how you’re making these delta clear and (fairly) simple.
I like this, though I feel like there’s probably a great deal more clarity/precision to be had here (as is often the case).
I’m not sure what “bad” means exactly. Do you basically mean, “if I were to spend resources R evaluating this object, I could identify some ways for it to be significantly improved?” If so, I assume we’d all agree that this is true for some amount R, the key question is what that amount is.
I also would flag that you draw attention to the issue with air conditioners. But for the issue of personal items, I’d argue that when I learn more about popular items, most of what I learn are positive things I didn’t realize. Like with Chesterton’s fence—when I get many well-reviewed or popular items, my impression is generally that there were many clever ideas or truths behind those items that I don’t at all have time to understand, let alone invent myself. A related example is cultural knowledge—a la The Secret of Our Success.
When I try out software problems, my first few attempts don’t go well for reasons I didn’t predict. The very fact that “it works in tests, and it didn’t require doing anything crazy” is a significant update.
Sure, with enough resources R, one could very likely make significant improvements to any item in question—but as a purchaser, I only have resources r << R to make my decisions. My goal is to buy items to make my life better, it’s fine that there are potential other gains to be had by huge R values.
> “verification is easier than generation”
I feel like this isn’t very well formalized. I think I agree with this comment on that post. I feel like you’re saying, “It’s easier to generate a simple thing than verify all possible things”, but Paul and co are saying more like, “It’s easier to verify/evaluate a thing of complexity C than generate a think of complexity C, in many important conditions”, or, “There are ways of delegating many tasks where the evaluation work required would be less than that of doing the work yourself, in order to get a result of a certain level of quality.”
I think that Paul’s take (as I understand it) seems like a fundamental aspect about the working human world. Humans generally get huge returns from not inventing the wheel all the time, and deferring to others a great deal. This is much of what makes civilization possible. It’s not perfect, but it’s much better than what individual humans could do by themselves.
> Under my current models, I expect that, shortly after AIs are able to autonomously develop, analyze and code numerical algorithms better than humans, there’s going to be some pretty big (like, multiple OOMs) progress in AI algorithmic efficiency (even ignoring a likely shift in ML/AI paradigm once AIs start doing the AI research)
I appreciate the precise prediction, but don’t see how it exactly follows. This seems more like a question of “how much better will early AIs be compared to current humans”, than one deeply about verification/generation. Also, I’d flag that in many worlds, I’d expect that pre-AGI AIs could do a lot of this code improvement—or they already have—so it’s not clear exactly how big a leap the “autonomously” is doing here.
---
I feel like there are probably several wins to be had by better formalizing these concepts more. They seem fairly cruxy/high-delta in the debates on this topic.
I would naively approach some of this with some simple expected value/accuracy lens. There are many assistants (including AIs) that I’d expect would improve the expected accuracy on key decisions, like knowing which AI systems to trust. In theory, it’s possible to show a bunch of situations where delegation would be EV-positive.
That said, a separate observer could of course claim that one using the process above would be so wrong as to be committing self-harm. Like, “I think that when you would try to use delegation, your estimates of impact are predictably wrong in ways that would lead to you losing.” But this seems like mainly a question about “are humans going to be predictably overconfident in a certain domain, as seen by other specific humans”.
Thinking about this more, it seems like there are some key background assumptions that I’m missing.
Some assumptions that I often hear get presenting on this topic are things like:
1. “A misaligned AI will explicitly try to give us hard-to-find vulnerabilities, so verifying arbitrary statements from these AIs will be incredibly hard.”
2. “We need to generally have incredibly high assurances to build powerful systems that don’t kill us”.
My obvious counter-arguments would be:
1. Sure, but smart agents would have a reasonable prior that agents would be misaligned, and also, they would give these agents tasks that would be particularly easy to verify. Any action actually taken by a smart overseer, using information provided by another agent with a chance of being misaligned, M (known by the smart overseer), should be EV-positive in value. With some creativity, there’s likely a bunch of ways of structuring things (using systems likely not to be misaligned, using more verifiable questions), where many resulting actions will likely be heavily EV-positive.
2. “Again, my argument in (1). Second, we can build these systems gradually, and with a lot of help from people/AIs that won’t require such high assurances.” (This is similar to the HCH / oversight arguments)
I think an interesting version of this is “if I were to spend resource R evaluating this object, I could identify some ways for it to be significantly improved (even when factoring in additinoal cost) that the productino team probably already knew about”