I have lost interest in the Löbian approach to tiling, because probabilistic tiling results seem like they can be strong enough and with much less suspicious-looking solutions. Expected value maximization is a better way of looking at agentic behavior anyway. Trying to logically prove some safety predicate for all actions seems like a worse paradigm than trying to prove some safety properties for the system overall (including proving that those properties tile under as-reasonable-as-possible assumptions, plus sanity-checking what happens when those assumptions aren’t precisely true).
I do think Löb-ish reasoning still seems potentially important for coordination and cooperation, which I expect to feature in important tiling results (if this research program continues to make progress). However, I am optimistic about replacing Löb’s Theorem with Payor’s Lemma in this context.
I don’t completely discount the pivotal-act approach, but I am currently more optimistic about developing safety criteria & designs which could achieve some degree of consensus amongst researchers, and make their way into commercial AI, perhaps through regulation.
(I don’t fully understand yet what results your aiming for, but yeah makes sense that probabilistic guarantees make some stuff more feasible. Not sure whether there might be more relaxations I’d be fine to at least initially make.)
I have lost interest in the Löbian approach to tiling, because probabilistic tiling results seem like they can be strong enough and with much less suspicious-looking solutions. Expected value maximization is a better way of looking at agentic behavior anyway. Trying to logically prove some safety predicate for all actions seems like a worse paradigm than trying to prove some safety properties for the system overall (including proving that those properties tile under as-reasonable-as-possible assumptions, plus sanity-checking what happens when those assumptions aren’t precisely true).
I do think Löb-ish reasoning still seems potentially important for coordination and cooperation, which I expect to feature in important tiling results (if this research program continues to make progress). However, I am optimistic about replacing Löb’s Theorem with Payor’s Lemma in this context.
I don’t completely discount the pivotal-act approach, but I am currently more optimistic about developing safety criteria & designs which could achieve some degree of consensus amongst researchers, and make their way into commercial AI, perhaps through regulation.
Thanks!
(I don’t fully understand yet what results your aiming for, but yeah makes sense that probabilistic guarantees make some stuff more feasible. Not sure whether there might be more relaxations I’d be fine to at least initially make.)