David Lorell

Karma: 2,360

David Lorell Sep 20, 2024, 10:08 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Seconded. I have extensional ideas about “symbolic representations” and how they differ from.… non-representations.… but I would not trust this understanding with much weight.

David Lorell Sep 20, 2024, 10:06 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Seconded. Comments above.

David Lorell Sep 20, 2024, 10:04 PM
2 points
0
on: [Error communicating with LW2 server]
Indeed, our beliefs-about-values can be integrated into the same system as all our other beliefs, allowing for e.g. ordinary factual evidence to become relevant to beliefs about values in some cases.
Super unclear to the uninitiated what this means. (And therefore threateningly confusing to our future selves.)

Maybe: “Indeed, we can plug ‘value’ variables into our epistemic models (like, for instance, our models of what brings about reward signals) and update them as a result of non-value-laden facts about the world.”

David Lorell Sep 20, 2024, 10:01 PM
2 points
0
on: [Error communicating with LW2 server]
But clearly the reward signal is not itself our values.
Ahhhh

Maybe: “But presumably the reward signal does not plug directly into the action-decision system.”?

Or: “But intuitively we do not value reward for its own sake.”?

David Lorell Sep 20, 2024, 9:59 PM
2 points
0
on: [Error communicating with LW2 server]
It does seem like humans have some kind of physiological “reward”, in a hand-wavy reinforcement-learning-esque sense, which seems to at least partially drive the subjective valuation of things.
Hrm… If this compresses down to, “Humans are clearly compelled at least in part by what ‘feels good’.” then I think it’s fine. If not, then this is an awkward sentence and we should discuss.

David Lorell Sep 20, 2024, 9:57 PM
2 points
0
on: [Error communicating with LW2 server]
an agent could aim to pursue any values regardless of what the world outside it looks like;
Without knowing what values are, it’s unclear that an agent could aim to pursue any of them. The implicit model here is that there is something like a value function in DP which gets passed into the action-decider along with the world model and that drives the agent. But I think we’re saying something more general than that.

David Lorell Sep 20, 2024, 9:54 PM
2 points
0
on: [Error communicating with LW2 server]
but the fact that it makes sense to us to talk about our beliefs
Better terminology for the phenomenon of “making sense” in the above way?

David Lorell Sep 20, 2024, 9:51 PM
2 points
0
on: [Error communicating with LW2 server]
“learn” in the sense that their behavior adapts to their environment.
I want a new word for this. “Learn” vs “Adapt” maybe. Learn means updating of symbolic references (maps) while Adapt means something like responding to stimuli in a systematic way.

David Lorell Sep 20, 2024, 9:33 PM
4 points
0
in reply to: Measure’s comment on: We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap
Not quite what we were trying to say in the post. Rather than tradeoffs being decided on reflection, we were trying to talk about the causal-inference-style “explaining away” which the reflection gives enough compute for. In Johannes’s example, the idea is that the sadist might model the reward as coming potentially from two independent causes: a hardcoded sadist response, and “actually” valuing the pain caused. Since the probability of one cause, given the effect, goes down when we also know that the other cause definitely obtained, the sadist might lower their probability that they actually value hurting people given that (after reflection) they’re quite sure they are hardcoded to get reward for it. That’s how it’s analagous to the ant thing.

David Lorell Sep 20, 2024, 7:58 PM
4 points
0
in reply to: Raemon’s comment on: We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap
Suppose you have a randomly activated (not dependent on weather) sprinkler system, and also it rains sometimes. These are two independent causes for the sidewalk being wet, each of which are capable of getting the job done all on their own. Suppose you notice that the sidewalk is wet, so it definitely either rained, sprinkled, or both. If I told you it had rained last night, your probability that the sprinklers went on (given that it is wet) should go down, since they already explain the wet sidewalk. If I told you instead that the sprinklers went on last night, then your probability of it having rained (given that it is wet) goes down for a similar reason. This is what “explaining away” is in causal inference. The probability of a cause given its effect goes down when an alternative cause is present.

In the post, the supposedly independent causes are “hardcoded ant-in-mouth aversion” and “value of eating escamoles”, and the effect is negative reward. Realizing that you have a hardcoded ant-in-mouth aversion is like learning that the sprinklers were on last night. The sprinklers being on (incompletely) “explain away” the rain as a cause for the sidewalk being wet. The hardcoded ant-in-mouth aversion explains away the-amount-you-value-escamoles as a cause for the low reward.

I’m not totally sure if that answers your question, maybe you were asking “why model my values as a cause of the negative reward, separate from the hardcoded response itself”? And if so, I think I’d rephrase the heart of the question as, “what do the values in this reward model actually correspond to out in the world, if anything? What are the ‘real values’ which reward is treated as evidence of?” (We’ve done some thinking about that and might put out a post on that soon.)

We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap

johnswentworth and David Lorell

Sep 19, 2024, 10:22 PM

48 points

48 comments5 min readLW link

David Lorell Aug 29, 2024, 9:30 PM
6 points
0
in reply to: johnswentworth’s comment on: … Wait, our models of semantics should inform fluid mechanics?!?
This is fascinating and I would love to hear about anything else you know of a similar flavor.
Seconded!!

… Wait, our models of semantics should inform fluid mechanics?!?

johnswentworth and David Lorell

Aug 26, 2024, 4:38 PM

59 points

18 comments4 min readLW link

Interoperable High Level Structures: Early Thoughts on Adjectives

johnswentworth and David Lorell

Aug 22, 2024, 9:12 PM

49 points

1 comment7 min readLW link

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed

johnswentworth and David Lorell

Aug 22, 2024, 7:19 PM

42 points

4 comments4 min readLW link

Some Unorthodox Ways To Achieve High GDP Growth

johnswentworth and David Lorell

Aug 8, 2024, 6:58 PM

57 points

6 comments6 min readLW link

A Simple Toy Coherence Theorem

johnswentworth and David Lorell

Aug 2, 2024, 5:47 PM

74 points

22 comments7 min readLW link

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication

johnswentworth and David Lorell

Jul 26, 2024, 12:33 AM

93 points

2 comments13 min readLW link

(Approximately) Deterministic Natural Latents

johnswentworth and David Lorell

19 Jul 2024 23:02 UTC

42 points

1 comment4 min readLW link

3C’s: A Recipe For Mathing Concepts

johnswentworth and David Lorell

3 Jul 2024 1:06 UTC

81 points

5 comments7 min readLW link