My claim there was that in a world where alignment is about translation you could just do testing / reversibility etc. I do find this somewhat persuasive that that claim was wrong.
Nonetheless, I don’t think the power law dynamic really matches my model of the situation. I was more imagining a model with some sort of threshold effect:
1. Economic value is often tied to high levels of reliability, perhaps because:
1a. Unacceptable risks (self-driving cars)
1b. Small failure rates still lead to many failures at scale (e.g. imagine if cars broke down once every 10K miles—this is a low failure rate, but many people would have to deal with this multiple times a year)
1c. Other people would like to build on top of your product, and can’t deal with an abstraction that perpetually leaks, because that vastly increases the complexity of using the product. (True of nearly all software, even end user software—if Google Docs had a 0.1% chance of failing to save my work, I would not use it.)
2. All of these lead to ~threshold effects: once the failure rate drops below some threshold t, it becomes economically valuable and people start producing it; this leads to more investment that reduces the failure rate further, making it even more valuable. Notably, these are not power laws. (In practice, they aren’t sharp thresholds—maybe at a failure rate of 0.1%, you get 1% of the potential market, at 0.01%, you get 50%, and at 0.001% you get 99%.)
3. So when I agree with “the value is in the long tail”, I mostly mean “the threshold t is very very low; the amount of effort it takes to get there is typically higher than people expect”. But the threshold t still varies across domains, and it’s still possible for testing-style approaches to reach the threshold; it depends on the particular domain at hand.
I think this argument applies both to self-driving cars, and traditional software (a la big tech companies), which is why I still used big tech companies as an example where value is in the tail.
My claim there was that in a world where alignment is about translation you could just do testing / reversibility etc. I do find this somewhat persuasive that that claim was wrong.
Nonetheless, I don’t think the power law dynamic really matches my model of the situation. I was more imagining a model with some sort of threshold effect:
1. Economic value is often tied to high levels of reliability, perhaps because:
1a. Unacceptable risks (self-driving cars)
1b. Small failure rates still lead to many failures at scale (e.g. imagine if cars broke down once every 10K miles—this is a low failure rate, but many people would have to deal with this multiple times a year)
1c. Other people would like to build on top of your product, and can’t deal with an abstraction that perpetually leaks, because that vastly increases the complexity of using the product. (True of nearly all software, even end user software—if Google Docs had a 0.1% chance of failing to save my work, I would not use it.)
2. All of these lead to ~threshold effects: once the failure rate drops below some threshold t, it becomes economically valuable and people start producing it; this leads to more investment that reduces the failure rate further, making it even more valuable. Notably, these are not power laws. (In practice, they aren’t sharp thresholds—maybe at a failure rate of 0.1%, you get 1% of the potential market, at 0.01%, you get 50%, and at 0.001% you get 99%.)
3. So when I agree with “the value is in the long tail”, I mostly mean “the threshold t is very very low; the amount of effort it takes to get there is typically higher than people expect”. But the threshold t still varies across domains, and it’s still possible for testing-style approaches to reach the threshold; it depends on the particular domain at hand.
I think this argument applies both to self-driving cars, and traditional software (a la big tech companies), which is why I still used big tech companies as an example where value is in the tail.
Agree, and you’ve articulated this much better than I had in my head. Thank you.