Charlie Steiner comments on Inner and outer alignment decompose one hard problem into two extremely hard problems

Charlie Steiner 3 Dec 2022 22:03 UTC
2 points
−2
Yeah, but on the other hand, I think this is looking for essential differences where they don’t exist. I made a comment similar to this on the previous post. It’s not like one side is building rockets and the other side is building ornithopters—or one side is advocating building computers out of evilite, while the other side says we should build the computer out of alignmentronium.
“reward functions can’t solve alignment because alignment isn’t maximizing a mathematical function.”
Alignment doesn’t run on some nega-math that can’t be cast as an optimization problem. If you look at the example of the value-child who really wants to learn a lot in school, I admit it’s a bit tricky to cash this out in terms of optimization. But if the lesson you take from this is “it works because it really wants to succeed, this is a property that cannot be translated as maximizing a mathematical function,” then I think that’s a drastic overreach.
- anonymousaisafety 4 Dec 2022 19:27 UTC
  6 points
  4
  Parent
  I realize that my position might seem increasingly flippant, but I really think it is necessary to acknowledge that you’ve stated a core assumption as a fact.
  Alignment doesn’t run on some nega-math that can’t be cast as an optimization problem.
  I am not saying that the concept of “alignment” is some bizarre meta-physical idea that cannot be approximated by a computer because something something human souls etc, or some other nonsense.
  However the assumption that “alignment is representable in math” directly implies “alignment is representable as an optimization problem” seems potentially false to me, and I’m not sure why you’re certain it is true.
  There exist systems that can be 1.) represented mathematically, 2.) perform computations, and 3.) do not correspond to some type of min/max optimization, e.g. various analog computers or cellular automaton.
  I don’t think it is ridiculous to suggest that what the human brain does is 1.) representable in math, 2.) in some type of way that we could actually understand and re-implement it on hardware / software systems, and 3.) but not as an optimization problem where there exists some reward function to maximize or some loss function to minimize.
  - cfoster0 4 Dec 2022 20:06 UTC
    3 points
    2
    Parent
    
    There exist systems that can be 1.) represented mathematically, 2.) perform computations, and 3.) do not correspond to some type of min/max optimization, e.g. various analog computers or cellular automaton.
    
    You don’t even have to go that far. What about, just, regular non-iterative programs? Are type(obj) or json.dump(dict) or resnet50(image) usefully/nontrivially recast as optimization programs? AFAICT there are a ton of things that are made up of normal math/computation and where trying to recast them as optimization problems isn’t helpful.