Noosphere89 comments on Simplicity arguments for scheming (Section 4.3 of “Scheming AIs”)

Noosphere89 7 Dec 2023 20:01 UTC
2 points
0

Hubinger’s notion of simplicity, here, is closely related to measures of algorithmic complexity like “Kolmogorov complexity,” which measure the complexity of a string by reference to the length of the shortest program that outputs that string when fed into a chosen Universal Turing Machine (UTM). One obvious issue here is that this sort of definition is relative to the choice of UTM (just as, e.g., when we imagine re-writing a neural net’s algorithm using other code, we need to pick the programming language).[4] Discussions of algorithmic complexity often ignore this issue on the grounds that it only adds a constant (since any given UTM can mimic any other if fed the right prefix), but it’s not clear to me, at least, when such constants might or might not matter to a given analysis – for example, the analysis at stake here.[5]

Indeed, my vague sense is that certain discussions of simplicity in the context of computer science often implicitly assume what I’ve called “simplicity realism” – a view on which simplicity in some deep sense an objective thing, ultimately independent of e.g. your choice of programming language or UTM, but which different metrics of simplicity are all tracking (albeit, imperfectly). And perhaps this view has merit (for example, my impression is that different metrics of complexity often reach similar conclusions in many cases – though this could have many explanations). However, I don’t, personally, want to assume it. And especially absent some objective sense of simplicity, it becomes more important to say which particular sense you have in mind.

My general thoughts on simplicity realism is that it’s probably false if interpreted to say that simplicity doesn’t depend at all on the UTM or programming language, but it doesn’t matter as much, since all UTMs are able to simulate any Turing Machine, and a lot of programming languages like C++ or Java or Python are Turing Complete, so it doesn’t matter whether there is an objective notion of simplicity, since they can all do the same things, just with different difficulties. Thankfully, there are limits to how much it depends on the language used to describe things, but I agree that people should fix a programming language/UTM more when talking about simplicity, especially the languages that have as low constant overhead as possible, so that you don’t have to deal with the arbitrarily complicated/cumbersome language

I definitely agree that in practice, you can’t ignore being finitely wrong, or having to add a finite constant, as you usually want those numbers to be as low as possible, and they don’t give you enough information about how wrong you are or how complicated something is, which is usually very important information to do things correctly.

More generally, your points on the additive constant issue are important, and this is why in practice it’s usually useful to rig your UTM, and to impose the additional requirement that the constant is as low as possible for a UTM/Programming Language that is Turing Complete, and I hadn’t seen a lot of research on lowering the constant. So I’m all for picking subsets of UTMs/Programming Languages that have low description overhead so that you don’t have to describe things in a complicated way.

It might be a useful research direction to determine how low can the additive constant for Kolmogorov Complexity can be in all programming languages that are Turing-Complete/All UTMs/Universal Turing Machines, while still being Turing-complete or being able to simulate a Universal Turing Machine.