Syntax vs semantics: alarm better example than thermostat
I had a post on empirically bridging syntax and semantics. It used the example of temperature, building on McCarthy and Searle’s dispute about the beliefs of thermostats.
But temperature wasn’t an ideal illustration of my points, as humans are not fine in their temperature sensitivity, so I’m presenting a better example here: detecting an intruder.
Internal and external variable
The external variable is a boolean which corresponds to whether there is any human in a certain initially empty greenhouse.
There are five different “agents” with internal variables :
An alarm on the door of the greenhouse, which goes off it a circuit is broken by the door being opened; internal variable .
A heat-detecting camera that starts an alarm if there is a something vaguely human sized and human-temperature inside the greenhouse (which is made out of sapphires, obviously); internal variable .
A motivated human guard who periodically looks into the greenhouse; internal variable .
A resourceful human with a lot of time and money, solely dedicated to detecting any intrusion into the green-house; internal variable .
A superintelligent robot version of the resourceful human; internal variable .
Then all the correlate well with the in a lot of circumstances. If a passerby or a naive burglar get into the greenhouse, they will trigger the door alarm and the heat alarm, while the guard, the resourceful human, and the robot will all see the intruder.
It is, however, pretty easy to fool the door alarm: simply go through a window. Conversely, someone could open the door without entering (or the wind or an earthquake could do so), causing the alarm to trigger with no-one in the greenhouse. So and are correlated in a relatively narrow set of environments . And if we consider instead the variable “the electric circuit that goes through the door is unbroken”, then it’s clear that and are much better correlated than and ; if there’s a semantic meaning to , then it’s far closer to than it is to .
The heat-camera can also be fooled. Simply spray the lense with some infrared-opaque paint, then enter at your leisure. For the converse, maybe a entering bear could trigger the alarm. It seems clear that is correlated with in a much wider set of environments, .
The human guard is hard to fool in either direction. We humans are very good at figuring out when other humans are around, so, assuming the guard is moderately attentive, tricking the guard in either direction requires a lot of work—though it is probably easier to trigger a false positive (the guard mistakenly thinks that there’s a person in the greenhouse) than a false negative (the guard doesn’t notice someone actually in the greenhouse). Confusing or overwhelming the guard becomes possible for intelligent adversaries. Still, the set of environments where is correlated to is much larger.
The resourceful human is even harder to fool, because they have all the advantages of the guard, plus any extra precautions they may have taken (such as adding alarms, cameras, crowds of onlookers, etc...). So is larger still.
Finally, bringing in a superintelligence really extends the accuracy of , even against intelligent adversaries, so is again much larger than any of the previous sets of environments.
Not strict inclusion, not perfect correlation
The agents above are on a hierarchy: every one of them has a much larger set of environments where is correlated with , than do any of the ones before that agent.
But none of the inclusions are strict. If someone sprays the heat-sensitive camera but then walks in through the door, the door-alarm will detect the intrusion even as the camera misses it. If someone disguises themselves as a table, they might be able to fool the guard but be caught by the camera. The resourceful human has their own personality, so there might be some manipulation of them that would fall flat for the guard.
And finally, even a superintelligence is computable, so the No Free Lunch theorems imply that there are some, stupidly complicated, environments in which , , , and are all equal to , but is not.
Since no computable agent can have a perfect correlation with the variable in question, there is a sense in which no symbol can be perfectly grounded (this gets even more obvious when you start slicing into the definition, and start wondering about the meanings of “human” and “a certain greenhouse” in ).
But, despite the lack of perfect inclusion and perfect correlation, there is a strong sense in which the later agents are better correlated than the earlier ones. Assume that we have a sensible computer language to pick a complexity prior in, and update on the world being roughly as we believe it to be. Then I’d be willing to wager that the posterior probabilities of the environments in which there are correlations, will be ordered:
.
- Research Agenda v0.9: Synthesising a human’s preferences into a utility function by 17 Jun 2019 17:46 UTC; 70 points) (
- What does GPT-3 understand? Symbol grounding and Chinese rooms by 3 Aug 2021 13:14 UTC; 40 points) (
- Finding the variables by 4 Mar 2019 19:37 UTC; 30 points) (
- Alignment Newsletter #48 by 11 Mar 2019 21:10 UTC; 29 points) (
- Bridging syntax and semantics, empirically by 19 Sep 2018 16:48 UTC; 25 points) (
- Classical symbol grounding and causal graphs by 14 Oct 2021 18:04 UTC; 22 points) (
- Full toy model for preference learning by 16 Oct 2019 11:06 UTC; 20 points) (
- Connecting the good regulator theorem with semantics and symbol grounding by 4 Mar 2021 14:35 UTC; 13 points) (
Sure. “If it’s smart, it won’t make simple mistakes.” But I’m also interested in the question of whether, given the first few in this sequence of approximate agents, one could do a good job at predicting the next one.
It seems like you could—like there is a simple rule governing these systems (“check whether there’s a human in the greenhouse”) that might involve difficult interaction with the world in practice but is much more straightforward when considered from the omniscient third-person view of imagination. And given that this rule is (arguendo) simple within a fairly natural (though not by any means unique) model of the world, and that it helps predict the sequence, one might be able to guess that this rule was likely just from looking at the sequence of systems.
(This also relies on the distinction between just trying to find likely or good-enough answers, and the AI doing search to find weird corner cases. The inferred next step in the sequence might be expected to give similar likely answers, with no similar guarantee for corner-case answers.)