faul_sname comments on “Aligned” foundation models don’t imply aligned systems

faul_sname 19 Apr 2023 20:18 UTC
3 points
0
Corporations often control lots of resources that give them a wide range of options and opportunities to make decisions and take actions. But which actions to take are ultimately made by humans, and I believe the prevailing theory of corporate management is that a single CEO empowered to make decisions is better than any kind of decision-by-committee or consensus-based decision-making.

My model of how corporations work is very different than this, and I think our different models might be driving the disagreement. Specifically, my model is roughly as follows.

To the extent that big, successful companies have a “goal”, that “goal” is usually to make enough money to grow and keep the lights on. A large corporation is much more successful at coming up with and performing actions which cause the corporation to keep existing and growing than any individual human in that corporation would be.

I agree that corporations are worse at explicit consequentialist reasoning than the individual humans inside the corporation. Most corporations do have things like “quarterly roadmaps” and “business plans” and other forms of explicit plans with contingency plans for how exactly they intend to navigate to a future desirable-to-them world-state, but I think the documented plan is in almost all cases worse than the plan that lives in the CEO’s head.

I claim that this does not matter, because neither the explicit written plan nor the hidden plan in the CEO’s head are the primary driver of what the corporation actually does. Instead, I think that almost all of the optimization pressure an organization comes from having a large number of people trying things, seeing what works and what doesn’t work, and then doing more of the kinds of things that worked and fewer of the things that didn’t work.

Let’s consider a concrete example. Imagine a gym chain called “SF Fitness.” About half of the revenue of “SF Fitness” comes from customers who are paying a monthly membership but have not attended the gym in quite a while. We observe that “SF Fitness” has made the process of cancelling your membership extremely obnoxious—to cancel their membership, a customer must navigate a confusing phone tree.

I expect that the “navigate a confusing phone tree” policy was not put in place by the CEO saying “we could make a lot of money by making it obnoxious to cancel your membership, so I will send a feature request to make a confusing phone tree”. Instead, I expect that the origin of the confusing phone tree was something like
1. The CEO set a KPI of “minutes of customer service time spent per month of membership”. This created an incentive for specific employees, whose bonuses or status were tied to that KPI.
2. One obvious way of reducing CS time was to add a simple menu for callers which allowed them to do certain things without needing to interact with a person (e.g. get gym hours), and which directed calls to the correct person if they did in fact need to go to a real person (e.g. correct language, department).
3. There was continued testing of which options, and which order of options, worked best to reduce CS time.
4. The CEO set a KPI of “fraction of customers cancelling their membership each month”,
5. Changes to the phone tree which increased cancellation rates were more likely to be reverted on account of making that metric worse than ones that decreased cancellation rates.
6. After a bunch of iterations, “SF Fitness” had a deep, confusing, nonsensical phone tree without any single person having made the decision to create it. That phone tree is more effective per unit of spent resources at reducing cancellations than any strategy that the CEO would have come up with themselves.
As a side note, I don’t think that prediction markets would actually improve the operation of most corporations by very much, relative to the current dominant approach of A/B testing. I can expand on why I think that if that’s a crux for you.

TL;DR: I think corporations are far better than individual humans at steering the world state into states that look like “the corporation controls lots of resources”, but worse at steering the world into arbitrary states.
- Max H 19 Apr 2023 20:39 UTC
  3 points
  0
  Parent
  I’m basically with you until here:
  That phone tree is more effective per unit of spent resources at reducing cancellations than any strategy that the CEO would have come up with themselves.
  
  Explicitly optimizing the cancellation part of the phone tree for confusion and difficulty definitely seems like a strategy that a not-particularly-ethical CEO or product manager can and do come up with entirely on their own.
  Companies routinely use dark patterns in software, in many cases as an explicit choice. On paper, the use of these patterns might be justified by KPIs or A/B tests or iteration to hide their true purpose. Maybe at some companies, these patterns get used accidentally or “emergently” with no one in the entire organization explicitly optimizing for the thing that the dark pattern is actually accomplishing. My claim is that any such company would be more effective, or at least no worse off, if the decision-makers were explicitly optimizing. At the very least, they could just skip or half-ass a bunch of the A/B testing and iteration when they already know their destination.
  - faul_sname 19 Apr 2023 21:07 UTC
    3 points
    0
    Parent
    I claim that a company that uses the strategy of “come up with a target KPI and then implement every possible dark pattern which could plausibly lead to that KPI doing what we want it to do” will be outperformed by a company which uses the strategy of “come up with a bunch of things to do which will plausibly change the value of that KPI, try all of them out on a small scale, and then scale up the ones that work and scale down the ones that don’t work”.
    
    For context for what’s driving my intuitions here, I at one point worked at a startup where one of the cofounders did pretty much operate under the philosophy of “let’s look at what other companies vaguely in our space do, paying particular attention to things which look like they’re intended to trick customers out of their money, and implement those things in our own product as faithfully as possible”. That strategy did in fact sometimes work, but in most cases it significantly hurt retention while being minimally useful for revenue (or, in a couple of cases, hurting both retention and revenue).
    
    In the language of your steering systems post (thank you for writing that BTW), I expect a company where the pruning system is “try things out at small scale in the real world and iterate” will outperform even a human who has a very good world-model.
    
    I actually suspect that this is a more general disagreement—I think that, in complicated domains, the approach of “figure out what things work locally, do those things, and iterate” outperforms the approach of “look at the problem, work really hard on coming up with an explicit model of the reward landscape, and then do the optimal thing according to your model”. Obviously you can outperform either approach in isolation by combining them, but I think that the best performance is generally far to the “try things and iterate” side. If that’s still a thing you disagree with, even in that framing, I suspect that’s a useful crux for us to explore more.
    
    Edit: To be more explicit, I think that corporations are more powerful at steering the future towards a narrow space than individual humans because they are able to try out more things than any individual human, not because they have a better internal model of the world or better process for deciding which of two atomic, mutually exclusive plans to execute.
    What links here?
    Noosphere89's comment on Joshua Achiam Public Statement Analysis by Zvi (10 Oct 2024 16:20 UTC; -2 points)
    - Noosphere89 9 Oct 2024 14:42 UTC
      4 points
      2
      Parent
      I think this claim:
      
      I actually suspect that this is a more general disagreement—I think that, in complicated domains, the approach of “figure out what things work locally, do those things, and iterate” outperforms the approach of “look at the problem, work really hard on coming up with an explicit model of the reward landscape, and then do the optimal thing according to your model”.
      
      is to first order probably the most general crux in whether you view LW as a useful thing, perhaps the most important useful thing, or whether you see LW as essentially worthless.
      
      It strikes at the core of the LessWrong worldview, so it’s natural that such deep differences result in different predictions.
      
      To be clear, I think you can sensibly disagree with people on LW about AI risk being high or a real thing, as well as other issues I haven’t looked at even under a worldview which agrees with “look at the problem, work really hard on coming up with an explicit model of the reward landscape, and then do the optimal thing according to your model”, but I think a lot of topics on LW make a lot more sense if you fundamentally buy the worldview under which model making is most important compared to iteration.
    - Max H 19 Apr 2023 22:43 UTC
      3 points
      0
      Parent
      A few remarks, not necessarily disagreements with anything specific:
      “hire a bunch of people and tell them to try a bunch of things according to some general guidelines, rather than explicitly micromanaging them” is causally upstream of trying out those things.
      Given access to the same resources, sufficiently smart humans are usually capable of explicit strategy stealing of any other human or group of humans in full generality and on any level of meta. Though object-level strategy stealing of your competitors might not always be a good strategy, as you point out.
      “figure out what things work locally, do those things, and iterate.” Agreed that this is a good strategy in general. I’m saying that an individual explicitly reflecting on and reasoning about what an organization is trying to do and the strategy they’re using to do it, should always help, or at least not hurt, if done correctly and at the right level of generality and meta. We might disagree about how strong those preconditions are, and how likely they are to be met in practice.
      - faul_sname 19 Apr 2023 23:27 UTC
        2 points
        0
        Parent
        All of those remarks look correct to me. Though “at the right level of generality and meta” is doing a lot of the work.