Should there be a ‘d’ on the end of ‘Debate’ in the title or am I parsing it wrong?
Alfred Harwood
It is meant to read 350°F. The point is that the temperature is too high to be a useful domestic thermostat. I have changed the sentence to make this clear (and added a ° symbol ). The passage now reads:
Scholten gives the evocative example of a thermostat which steers the temperature of a room to 350°F with a probability close to certainty. The entropy of the final distribution over room temperatures would be very low, so in this sense the regulator is still ‘good’, even though the temperature it achieves is too high for it to be useful as a domestic thermostat.
(Edit: I’ve just realised that 35°F would also be inappropriate for a domestic thermostat by virtue of being too cold so either works for the purpose of the example. Scholten does use 350, so I’ve stuck with that. Sorry, I’m unfamiliar with Fahrenheit!)
A Straightforward Explanation of the Good Regulator Theorem
Your reaction seems fair, thanks for your thoughts! Its a good a suggestion to add an epistemic status—I’ll be sure to add one next time I write something like this.
Got it, that makes sense. I think I was trying to get at something like this when I was talking about constraints/selection pressure (a system has less need to use abstractions if its compute is unconstrained or there is no selection pressure in the ‘produce short/quick programs’ direction) but your explanation makes this clearer. Thanks again for clearing this up!
Thanks for taking the time to explain this. This is a clears a lot of things up.
Let me see if I understand. So one reason that an agent might develop an abstraction is that it has a utility function that deals with that abstraction (if my utility function is ‘maximize the number of trees’, its helpful to have an abstraction for ‘trees’). But the NAH goes further than this and says that, even if an agent had a very ‘unnatural’ utility function which didn’t deal with abstractions (eg. it was something very fine-grained like ‘I value this atom being in this exact position and this atom being in a different position etc…’) it would still, for instrumental reasons, end up using the ‘natural’ set of abstractions because the natural abstractions are in some sense the only ‘proper’ set of abstractions for interacting with the world. Similarly, while there might be perceptual systems/brains/etc which favour using certain unnatural abstractions, once agents become capable enough to start pursuing complex goals (or rather goals requiring a high level of generality), the universe will force them to use the natural abstractions (or else fail to achieve their goals). Does this sound right?
Presumably its possible to define some ‘unnatural’ abstractions. Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?
Its late where I am now so I’m going to read carefully and respond to comments tomorrow, but before I go to bed I want to quickly respond to your claim that you found the post hostile because I don’t want to leave it hanging.
I wanted to express my disagreements/misunderstandings/whatever as clearly as I could but had no intention to express hostility. I bear no hostility towards anyone reading this, especially people who have worked hard thinking about important issues like AI alignment. Apologies to you and anyone else who found the post hostile.
Thanks for taking the time to explain this to me! I would like to read your links before responding to the meat of your comment, but I wanted to note something before going forward because there is a pattern I’ve noticed in both my verbal conversations on this subject and the comments so far.
I say something like ‘lots of systems don’t seem to converge on the same abstractions’ and then someone else says ‘yeah, I agree obviously’ and then starts talking about another feature of the NAH while not taking this as evidence against the NAH.
But most posts on the NAH explicitly mention something like the claim that many systems will converge on similar abstractions [1]. I find this really confusing!
Going forward it might be useful to taboo the phrase ‘the Natural Abstraction Hypothesis’ (?) and just discuss what we think is true about the world.
Your comment that its a claim about ‘proving things about the distribution of environments’ is helpful. To help me understand what people mean by the NAH could you tell me what would (in your view) constitute strong evidence against the NAH? (If the fact that we can point to systems which haven’t converged on using the same abstractions doesn’t count)
- ^
Natural Abstractions: Key Claims, Theorems and Critiques: ‘many cognitive systems learn similar abstractions’,
Testing the Natural Abstraction Hypothesis: Project Intro ‘a wide variety of cognitive architectures will learn to use approximately the same high-level abstract objects/concepts to reason about the world’
The Natural Abstraction Hypothesis: Implications and Evidence ’there exist abstractions (relatively low-dimensional summaries which capture information relevant for prediction) which are “natural” in the sense that we should expect a wide variety of cognitive systems to converge on using them.
′
- ^
Abstractions are not Natural
Hello! My name is Alfred. I recently took part in AI Safety Camp 2024 and have been thinking about the Agent-like structure problem. Hopefully I will have some posts to share on the subject soon.
When you are considering finite tape length, how do you deal with x(k−1) when k=0 or x(k+1) when k=N ?