Are we sure that given the choice between “lower crime, lower costs and algorithmic bias” and “higher crime, higher costs and only human bias”, and we have dictatorial power and can consider long-term effects, we would choose the latter on reflection?
Good point, thanks, I hadn’t thought that sometimes it actually would make sense, on reflection, to choose an algorithm pursuing an easy-to-measure goal over humans pursuing incorrect goals. One thing I’d add is that if one did delve into the research to work this out for a particular case, it seems that an important (but hard to quantify) consideration would be the extent to which choosing the algorithm in this case makes it more likely that the use of that algorithm becomes entrenched, or it sets a precedent for the use of such algorithms. This feels important since these effects could plausibly make WFLL1-like things more likely in the longer run (when the harm of using misaligned systems is higher, due to the higher capabilities of those systems).
Note ML systems are way more interpretable than humans, so if they are replacing humans then this shouldn’t make that much of a difference.
Good catch. I had the “AI systems replace entire institutions” scenario in mind, but agree that WFLL1 actually feels closer to “AI systems replace humans”. I’m pretty confused about what this would look like though, and in particular, whether institutions would retain their interpretability if this happened. It seems plausible that the best way to “carve up” an institution into individual agents/services differs for humans and AI systems. E.g. education/learning is big part of human institution design—you start at the bottom and work your way up as you learn skills and become trusted to act more autonomously—but this probably wouldn’t be the case for institutions composed of AI systems, since the “CEO” could just copy their model parameters to the “intern” :). And if institutions composed of AI systems are quite different to institutions composed of humans, then they might not be very interpretable. Sure, you could assert that AI systems replace humans one-for-one, but if this is not the best design, then there may be competitive pressure to move away from this towards something less interpretable.
One caveat is that on my models of AI development I don’t expect the CEO could just copy model parameters to the intern. I think it’s more likely that we have something along the lines of “graduate of <specific college major>” AI systems that you then copy and use as needed. But I don’t think this really affects your point.
Sure, you could assert that AI systems replace humans one-for-one, but if this is not the best design, then there may be competitive pressure to move away from this towards something less interpretable.
Yeah jtbc I definitely would not assert this. If I had to make an argument for as-much-interpretability, it would be something like “in the scenario we’re considering, AI systems are roughly human-level in capability; at this level of capability societal organization will still require a lot of modularity; if we know nothing else and assume agents are as black-boxy as humans, it seems reasonable to assume this will lead to a roughly similar amount of interpretability as current society”. But this is not a particularly strong argument, especially in the face of vast uncertainty about what the future looks like.
Thanks for your comment!
Good point, thanks, I hadn’t thought that sometimes it actually would make sense, on reflection, to choose an algorithm pursuing an easy-to-measure goal over humans pursuing incorrect goals. One thing I’d add is that if one did delve into the research to work this out for a particular case, it seems that an important (but hard to quantify) consideration would be the extent to which choosing the algorithm in this case makes it more likely that the use of that algorithm becomes entrenched, or it sets a precedent for the use of such algorithms. This feels important since these effects could plausibly make WFLL1-like things more likely in the longer run (when the harm of using misaligned systems is higher, due to the higher capabilities of those systems).
Good catch. I had the “AI systems replace entire institutions” scenario in mind, but agree that WFLL1 actually feels closer to “AI systems replace humans”. I’m pretty confused about what this would look like though, and in particular, whether institutions would retain their interpretability if this happened. It seems plausible that the best way to “carve up” an institution into individual agents/services differs for humans and AI systems. E.g. education/learning is big part of human institution design—you start at the bottom and work your way up as you learn skills and become trusted to act more autonomously—but this probably wouldn’t be the case for institutions composed of AI systems, since the “CEO” could just copy their model parameters to the “intern” :). And if institutions composed of AI systems are quite different to institutions composed of humans, then they might not be very interpretable. Sure, you could assert that AI systems replace humans one-for-one, but if this is not the best design, then there may be competitive pressure to move away from this towards something less interpretable.
Yup, all of that sounds right to me!
One caveat is that on my models of AI development I don’t expect the CEO could just copy model parameters to the intern. I think it’s more likely that we have something along the lines of “graduate of <specific college major>” AI systems that you then copy and use as needed. But I don’t think this really affects your point.
Yeah jtbc I definitely would not assert this. If I had to make an argument for as-much-interpretability, it would be something like “in the scenario we’re considering, AI systems are roughly human-level in capability; at this level of capability societal organization will still require a lot of modularity; if we know nothing else and assume agents are as black-boxy as humans, it seems reasonable to assume this will lead to a roughly similar amount of interpretability as current society”. But this is not a particularly strong argument, especially in the face of vast uncertainty about what the future looks like.