DragonGod comments on [missing post]

DragonGod 6 Feb 2023 22:08 UTC
3 points
1
General comment: I think its common for people working on theorems & math to forget the ways that the real world can violate their mathematical assumptions, even in very unlikely ways. The following is a concrete story of one thing that may go wrong. It is unlikely this particular problem goes wrong when proving any proofs or doing any theorems. The point is to illustrate a general lesson about subtle ways the world doesn’t live up to your platonic expectations.

I feel inclined to disagree here, as this doesn’t really match my philosophy/intuitions for constructing theorems. See my reply to @harfe:

My philosophy is to set up minimal preconditions that describe the universe we live in.

Such preconditions will probably also describe other universes and may not give us very strong statements about our particular universe, but that’s fine. We can iterate on our theorems and build progressive theorems that tighten the preconditions of earlier theorems and give us stricter post conditions.

If reality doesn’t conform to the preconditions of your theorem, then you chose overly restrictive preconditions. I think that’s just doing theorems badly, not necessarily an inherent flaw of theorems.

Choosing preconditions that describe reality isn’t that hard; e.g. we have multiple equivalently powerful models of computation that seem to accurately describe what is mechanically computable within our universe.

Even with simplifying/idealising assumptions algorithm designs, algorithm analysis and complexity theoretic results translate (modulo some minor adjustments) to real world computers.

For example, suppose that I have a proof that my AGI plan is 100% safe. I have a provably correct enumeration of all corrigibility properties, and a completely formally specified training story for why my architecture and training process will result in an AGI with those corrigibility properties, using some fancy new differential equation math my team invented & proved. We’ve run the whole thing through an automated proof checker, and it returned ‘yes, this is indeed a corrigible AGI you get at the end of this process’. None of us know how to program though, so we haven’t actually seen whether our new fancy differential equation math gives correct predictions on ML stuff.

Again, this violates my philosophy for proofs of safety guarantees. I want an iterative proof process. We start with minimal restrictive preconditions, see what results we can get, verify them empirically, after our meta level confidence in the theorems is strong, we can use it as foundations for further theorems, building more elaborate constructs that give us stronger guarantees:

We could synthesise similar theorems, combine disparate theorems into a unified whole, generalise existing theorems, etc.

There are a lot of tools in the mathematician’s toolbox.

Our first theorems should be simple/straightforward and easy to verify.

Building towers of abstraction in the sky without touching reality is another way to do maths wrong.

To be clear, the theorems I vibe with the most are a generalisation of the selection theorems agenda.

We’d both want empirical validation at many steps during the theorem process and those kind of theorems don’t really admit excessive detail as was present in your example.

If the theorems fail to apply, it’ll look like the preconditions not being met. An example of this is the coherence theorems. They assume (among others) the following preconditions:
- Path independent preferences/no internal agent state influencing preferences
- A total order over preferences
- Static preferences that don’t change with time
They fail to apply to humans because human preferences are more complicated than this model.

Robust empirical verification that the preconditions of the theorems hold/obtain in practice would be necessary to construct theorems that aren’t disconnected from reality.