awenonian comments on Why Agent Foundations? An Overly Abstract Explanation

awenonian 28 Mar 2022 16:21 UTC
7 points
0
I understand the point of your dialog, but I also feel like I could model someone saying “This Alignment Researcher is really being pedantic and getting caught in the weeds.” (especially someone who wasn’t sure why these questions should collapse into world models and correspondence.)
(After all, the Philosopher’s question probably didn’t depend on actual apples, and was just using an apple as a stand-in for something with positive utility. So, the inputs of the utility functions could easily be “apples” (where an apple is an object with 1 property, “owner”. Alice prefers apple.owner=”alice” (utility(a): return int(a.owner==‘alice’)), and Bob prefers apple.owner=”bob”) To sidestep the entire question of world models, and correspondence.)
I suspect you did this because the half formed question about apples was easier to come up with than a fully formed question that would necessarily require engagement with world models, and I’m not even sure that’s the wrong choice. But this was the impression I got reading it.
- KatWoods 31 Mar 2022 20:18 UTC
  4 points
  Parent
  I also wonder about this. If I’m understanding the post and comment right, it’s that if you don’t formulate it mathematically, it doesn’t generalize robustly enough? And that to formulate something mathematically you need to be ridiculously precise/pedantic?
  Although this is probably wrong and I’m mostly invoking Cunningham’s Law
  - awenonian 31 Mar 2022 21:50 UTC
    9 points
    Parent
    I doubt my ability to be entertaining, but perhaps I can be informative. The need for mathematical formulation is because, due to Goodhart’s law, imperfect proxies break down. Mathematics is a tool which is rigorous enough to get us from “that sounds like a pretty good definition” (like “zero correlation” in the radio signals example), to “I’ve proven this is the definition” (like “zero mutual information”).
    The proof can get you from “I really hope this works” to “As long as this system satisfies the proof’s assumptions, this will work”, because the proof states it’s assumptions clearly, while “this has worked previously” could, and likely does, rely on a great number of unspecified commonalities previous instances had.
    It gets precise and pedantic because it turns out that the things we often want to define for this endeavor are based on other things. “Mutual information” isn’t a useful formulation without a formulation for “information”. Similarly, in trying to define morality, it’s difficult to define what an agent should do in the world (or even what it means for an agent to do things in the world), without ideas of agency and doing, and the world. Every undefined term you use brings you further from a formulation you could actually use to create a proof.
    In all, mathematical formulation isn’t the goal, it’s the prerequisite. “Zero correlation” was mathematically formalized, but that was not sufficient.