David Lorell

Karma: 2,211

David Lorell Dec 6, 2024, 9:57 PM
7 points
0
in reply to: habryka’s comment on: johnswentworth’s Shortform
Sounds plausible. Is that 50% of coding work that the LLMs replace of a particular sort, and the other 50% a distinctly different sort?

David Lorell Dec 6, 2024, 9:55 PM
6 points
3
in reply to: Stephen McAleese’s comment on: johnswentworth’s Shortform
My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks.

David Lorell Dec 6, 2024, 9:51 PM
13 points
4
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
I do use LLMs for coding assistance every time I code now, and I have in fact noticed improvements in the coding abilities of the new models, but I basically endorse this. I mostly make small asks of the sort that sifting through docs or stack-overflow would normally answer. When I feel tempted to make big asks of the models, I end up spending more time trying to get the LLMs to get the bugs out than I’d have spent writing it all myself, and having the LLM produce code which is “close but not quite and possibly buggy and possibly subtly so” that I then have to understand and debug could maybe save time but I haven’t tried because it is more annoying than just doing it myself.

If someone has experience using LLMs to substantially accelerate things of a similar difficulty/flavor to transpilation of a high-level torch module into a functional JITable form in JAX which produces numerically close outputs, or implementation of a JAX/numpy based renderer of a traversable grid of lines borrowing only the window logic from, for example, pyglet (no GLSL calls, rasterize from scratch,) with consistent screen-space pixel width and fade-on-distance logic, I’d be interested in seeing how you do your thing. I’ve done both of these, with and without LLM help and I think leaning hard on the LLMs took me more time rather than less.
File I/O and other such ‘mundane’ boilerplate-y tasks work great right off the bat, but getting the details right on less common tasks still seems pretty hard to elicit from LLMs. (And breaking it down into pieces small enough for them to get it right is very time consuming and unpleasant.)

David Lorell Oct 28, 2024, 8:50 AM
9 points
3
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
I think that “getting good” at the “free association” game is in finding the sweet spot / negotiation between full freedom of association and directing toward your own interests, probably ideally with a skew toward what the other is interested in. If you’re both “free associating” with a bias toward your own interests and an additional skew toward perceived overlap, updating on that understanding along the way, then my experience says you’ll have a good chance of chatting about something that interests you both. (I.e. finding a spot of conversation which becomes much more directed than vibey free association.) Conditional on doing something like that strategy, I find it ends up being just a question of your relative+combined ability at this and the extent of overlap (or lack thereof) in interests.

So short model is: Git gud at free association (+sussing out interests) → gradient ascend yourselves to a more substantial conversation interesting to you both.

David Lorell Sep 20, 2024, 10:29 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
wiggitywiggitywact := fact about the world which requires a typical human to cross a large inferential gap.

David Lorell Sep 20, 2024, 10:27 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
wact := fact about the world
mact := fact about the mind
aact := fact about the agent more generally

vwact := value assigned by some agent to a fact about the world

David Lorell Sep 20, 2024, 10:23 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Seems accurate to me. This has been an exercise in the initial step(s) of CCC, which indeed consist of “the phenomenon looks this way to me. It also looks that way to others? Cool. What are we all cottoning on to?”

David Lorell Sep 20, 2024, 10:20 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Wait. I thought that was crossing the is-ought gap. As I think of it, the is ought gap refers to the apparent type-clash and unclear evidential entanglement between facts-about-the-world and values-an-agent-assigns-to-facts-about-the-world. And also as I think of it, “should be” always is short hand for “should be according to me” though possibly means some kind of aggregated thing but also ground out in subjective shoulds.

So “how the external world is” does not tell us “how the external world should be” …. except in so far as the external world has become causally/logically entangled with a particular agent’s ‘true values’. (Punting on what are an agent’s “true values” are as opposed to the much easier “motivating values” or possibly “estimated true values.” But for the purposes of this comment, its sufficient to assume that they are dependent on some readable property (or logical consequence of readable properties) of the agent itself.)

David Lorell Sep 20, 2024, 10:12 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
We have at least one jury rigged idea! Conceptually. Kind of.

David Lorell Sep 20, 2024, 10:11 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Yeeeahhh.… But maybe it’s just awkwardly worded rather than being deeply confused. Like: “The learned algorithms which an adaptive system implements may not necessarily accept, output, or even internally use data(structures) which have any relationship at all to some external environment.” “Also what the hell is ‘reference’.”

David Lorell Sep 20, 2024, 10:08 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Seconded. I have extensional ideas about “symbolic representations” and how they differ from.… non-representations.… but I would not trust this understanding with much weight.

David Lorell Sep 20, 2024, 10:06 PM
2 points
0
in reply to: johnswentworth’s comment on: [Error communicating with LW2 server]
Seconded. Comments above.

David Lorell Sep 20, 2024, 10:04 PM
2 points
0
on: [Error communicating with LW2 server]
Indeed, our beliefs-about-values can be integrated into the same system as all our other beliefs, allowing for e.g. ordinary factual evidence to become relevant to beliefs about values in some cases.
Super unclear to the uninitiated what this means. (And therefore threateningly confusing to our future selves.)

Maybe: “Indeed, we can plug ‘value’ variables into our epistemic models (like, for instance, our models of what brings about reward signals) and update them as a result of non-value-laden facts about the world.”

David Lorell Sep 20, 2024, 10:01 PM
2 points
0
on: [Error communicating with LW2 server]
But clearly the reward signal is not itself our values.
Ahhhh

Maybe: “But presumably the reward signal does not plug directly into the action-decision system.”?

Or: “But intuitively we do not value reward for its own sake.”?

David Lorell Sep 20, 2024, 9:59 PM
2 points
0
on: [Error communicating with LW2 server]
It does seem like humans have some kind of physiological “reward”, in a hand-wavy reinforcement-learning-esque sense, which seems to at least partially drive the subjective valuation of things.
Hrm… If this compresses down to, “Humans are clearly compelled at least in part by what ‘feels good’.” then I think it’s fine. If not, then this is an awkward sentence and we should discuss.

David Lorell Sep 20, 2024, 9:57 PM
2 points
0
on: [Error communicating with LW2 server]
an agent could aim to pursue any values regardless of what the world outside it looks like;
Without knowing what values are, it’s unclear that an agent could aim to pursue any of them. The implicit model here is that there is something like a value function in DP which gets passed into the action-decider along with the world model and that drives the agent. But I think we’re saying something more general than that.

David Lorell Sep 20, 2024, 9:54 PM
2 points
0
on: [Error communicating with LW2 server]
but the fact that it makes sense to us to talk about our beliefs
Better terminology for the phenomenon of “making sense” in the above way?

David Lorell Sep 20, 2024, 9:51 PM
2 points
0
on: [Error communicating with LW2 server]
“learn” in the sense that their behavior adapts to their environment.
I want a new word for this. “Learn” vs “Adapt” maybe. Learn means updating of symbolic references (maps) while Adapt means something like responding to stimuli in a systematic way.

David Lorell Sep 20, 2024, 9:33 PM
4 points
0
in reply to: Measure’s comment on: We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap
Not quite what we were trying to say in the post. Rather than tradeoffs being decided on reflection, we were trying to talk about the causal-inference-style “explaining away” which the reflection gives enough compute for. In Johannes’s example, the idea is that the sadist might model the reward as coming potentially from two independent causes: a hardcoded sadist response, and “actually” valuing the pain caused. Since the probability of one cause, given the effect, goes down when we also know that the other cause definitely obtained, the sadist might lower their probability that they actually value hurting people given that (after reflection) they’re quite sure they are hardcoded to get reward for it. That’s how it’s analagous to the ant thing.

David Lorell Sep 20, 2024, 7:58 PM
4 points
0
in reply to: Raemon’s comment on: We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap
Suppose you have a randomly activated (not dependent on weather) sprinkler system, and also it rains sometimes. These are two independent causes for the sidewalk being wet, each of which are capable of getting the job done all on their own. Suppose you notice that the sidewalk is wet, so it definitely either rained, sprinkled, or both. If I told you it had rained last night, your probability that the sprinklers went on (given that it is wet) should go down, since they already explain the wet sidewalk. If I told you instead that the sprinklers went on last night, then your probability of it having rained (given that it is wet) goes down for a similar reason. This is what “explaining away” is in causal inference. The probability of a cause given its effect goes down when an alternative cause is present.

In the post, the supposedly independent causes are “hardcoded ant-in-mouth aversion” and “value of eating escamoles”, and the effect is negative reward. Realizing that you have a hardcoded ant-in-mouth aversion is like learning that the sprinklers were on last night. The sprinklers being on (incompletely) “explain away” the rain as a cause for the sidewalk being wet. The hardcoded ant-in-mouth aversion explains away the-amount-you-value-escamoles as a cause for the low reward.

I’m not totally sure if that answers your question, maybe you were asking “why model my values as a cause of the negative reward, separate from the hardcoded response itself”? And if so, I think I’d rephrase the heart of the question as, “what do the values in this reward model actually correspond to out in the world, if anything? What are the ‘real values’ which reward is treated as evidence of?” (We’ve done some thinking about that and might put out a post on that soon.)