Ben Amitay

Karma: 209

A math and computer science graduate interested in machine and animal cognition, philosophy of language, interdisciplinary ideas, etc.

Ben Amitay Apr 10, 2023, 1:37 PM
1 point
0
in reply to: dbraxi’s comment on: Upcoming Changes in Large Language Models
It can work by generalizing existing capabilities. My understanding of the problem is that it can not get the benefits of extra RL training because training to better choose what to remember is to tricky—it involves long range influence, and estimating the opportunity cost of fetching one thing and not another, etc. Those problems are probability solvable, but not trivial.

Ben Amitay Apr 2, 2023, 5:45 AM
3 points
0
in reply to: Stephen McAleese’s comment on: Definitive confirmation of shard theory
So imagine hearing the audio version—no images there

Agents synchronization

Ben AmitayMar 11, 2023, 6:41 PM

12 points

1 comment5 min readLW link

Ben Amitay Feb 27, 2023, 6:25 PM
3 points
0
in reply to: Vladimir_Nesov’s comment on: Incentives and Selection: A Missing Frame From AI Threat Discussions?
I like the general direction of LLMs being more behaviorally “anthropomorphic”, so hopefully will look into the LLM alignment links soon :-)
The useful technique is...
Agree—didn’t find a handle that I understand well enough in order to point at what I didn’t.
We have here a morally dubious decision
I think my problem was with sentences like that—there is a reference to a decision, but I’m not sure whether to a decision mentioned in the article or in one of the comments.
the scenario in this thread
Didn’t disambiguate it for me though I feel like it should.
I am familiar with the technical LW terms separately, so Ill probably understand their relevance once the reference issue is resolved.

Ben Amitay Feb 27, 2023, 5:35 PM
3 points
0
in reply to: Vladimir_Nesov’s comment on: Incentives and Selection: A Missing Frame From AI Threat Discussions?
I didn’t understand anything here, and am not sure if it is due to a linguistic gap or something deeper. Do you mean that LLMs are unusually dangerous because they are not super human enough to not be threatened? (BTW Im more worried that telling a simulator that it is an AI in a culture that has the terminator makes the terminator a too-likely completion)

Ben Amitay Feb 27, 2023, 6:54 AM
3 points
0
in reply to: Vladimir_Nesov’s comment on: Incentives and Selection: A Missing Frame From AI Threat Discussions?
I agree that it may find general chaos usefull for r buying time at some point, but chaos is not extinction. When it is strong enogh to kill all humans, it is probably strong enough to do something better (for its goals).

Ben Amitay Feb 26, 2023, 7:47 AM
4 points
2
in reply to: Vladimir_Nesov’s comment on: Incentives and Selection: A Missing Frame From AI Threat Discussions?
Don’t you assume much more threat from humans than there actually is? Surely, an AGI will understand that it can destroy humanity easily. Then it would think a little more, and see the many other ways to remove the threat that are strictly cheaper and just as effective—from restricting/monitoring our access to computers, to simply convince/hack us all to work for it. By the time it would have technology that make us strictly useless (like horses), it would probably have so much resources that destroying us would just not be a priority, and not worth the destruction of the information that we contain—the way humans would try to avoid reducing biodiversity for scientific reasons if not others.

In that sense I prefer Eliezer’s “you are made of atoms that it needs for something else”—but it may take long time before it have better things to do with those specific atoms and no easier atoms to use.

Ben Amitay Feb 25, 2023, 6:01 PM
2 points
−4
in reply to: Eliezer Yudkowsky’s comment on: AGI in sight: our look at the game board
I meant to criticize moving too far toward “do no harm” policy in general due to inability to achieve a solution that would satisfy us if we had the choice. I agree specifically that if anyone knows of a bottleneck unnoticed by people like Bengio and LeCun, LW is not the right forum to discuss it.

Is there a place like that though? I may be vastly misinformed, but last time I checked MIRI gave the impression of aiming at very different directions (“bringing to safety” mindset) - though I admit that I didn’t watch it closely, and it may not be obvious from the outside what kind of work is done and not published.

[Edit: “moving toward ‘do no harm’”—“moving to” was a grammar mistake that make it contrary to position you stated above—sorry]

Ben Amitay Feb 25, 2023, 3:37 PM
10 points
7
in reply to: Eliezer Yudkowsky’s comment on: AGI in sight: our look at the game board
I think that is an example of the huge potential damage of “security mindset” gone wrong. If you can’t save your family, as in “bring them to safety”, at least make them marginally safer.

(Sorry for the tone of the following—it is not intended at you personally, who did much more than your fair share)

Create a closed community that you mostly trust, and let that community speak freely about how to win. Invent another damn safety patch that will make it marginally harder for the monster to eat them, in hope that it chooses to eat the moon first. I heard you say that most of your probability of survival comes from the possibility that you are wrong—trying to protect your family is trying to at least optimize for such miracle.

There is no safe way out of a war zone. Hiding behind a rock is not therfore the answer.

Ben Amitay Feb 25, 2023, 1:10 PM
LW: 2 AF: 1
1
AF
on: AGI in sight: our look at the game board
I can think of several obstacles for AGIs that are likely to actually be created (i.e. seem economically useful, and do not display misalignment that even Microsoft can’t ignore before being capable enough to be xrisk). Most of those obstacles are widely recognized in the rl community, so you probably see them as solvable or avoidable. I did possibly think of an economically-valuable and not-obviously-catastrophic exception to the probably-biggest obstacle though, so my confidence is low. I would share it in a private discussion, because I think that we are past the point when strict do-no-harm policy is wise.

Ben Amitay Feb 24, 2023, 3:12 PM
1 point
0
in reply to: tailcalled’s comment on: Training for corrigability: obvious problems?
More on the meta level: “This sort of works, but not enough to solve it.”—do you mean “not enough” as in “good try but we probably need something else” or as in “this is a promising direction, just solve some tractable downstream problem”?

Ben Amitay Feb 24, 2023, 3:09 PM
1 point
0
in reply to: tailcalled’s comment on: Training for corrigability: obvious problems?
“which utility-wise is similar to the distribution not containing human values.” - from the point of view of corrigibility to human values, or of learning capabilities to achieve human values? For corrigability I don’t see why you need high probability for specific new goal as long as it is diverse enough to make there be no simpler generalization than “don’t care about controling goals”. For capabilities my intuition is that starting with superficially-aligned goals is enough.

[Question] Training for corrigability: obvious problems?

Ben AmitayFeb 24, 2023, 2:02 PM

4 points

6 comments1 min readLW link

Ben Amitay Feb 22, 2023, 6:57 AM
LW: 1 AF: 1
0
AF
on: Behavioral and mechanistic definitions (often confuse AI alignment discussions)
This is an important distinction, that show in its cleanest form in mathematics—where you have constructive definitions from the one hand, and axiomatic definitions from the other. It is important to note though that is is not quite a dichotomy—you may have a constructive definition that assume aximatically-defined entities, or other constructions. For example: vector spaces are usually defined axiomatically, but vector spaces over the real numbers assume the real numbers—that have multiple axiomatic definitions and corresponding constructions.

In science, there is the classic “are wails fish?”—which is mostly about whether to look at their construction/mechanism (genetics, development, metabolism...) or their patterns of interaction with their environment (the behavior of swimming and the structure that support it). That example also emphasize that we natural language simplly don’t respect this distinction, and consider both internal structure and outside relations as legitimate “coordinates in thingspace” that may be used together to identify geometrically-natural categories.

Ben Amitay Feb 19, 2023, 7:23 AM
1 point
0
on: Bing chat is the AI fire alarm
A others said, it mostly made me update in the direction of “less dignity”—my guess is still that it is more misaligned than agentic/deceptive/carefull and that it is going to be disconnected from the internet for some trivial ofence before it does anything xrisky; but its now more salient to me that humanity will not miss any reasonable chance of doom until something bad enough happen, and will only survive if there is no sharp left turn

Ben Amitay Feb 10, 2023, 2:33 PM
1 point
0
in reply to: beren’s comment on: Empathy as a natural consequence of learnt reward models
We agree 😀

What do you think about some brainstorming in the chat about how to use that hook?

Ben Amitay Feb 8, 2023, 2:49 PM
LW: 1 AF: 1
0
AF
in reply to: Michele Campolo’s comment on: On value in humans, other animals, and AI
Since I became reasonably sure that I understand your position and reasoning—mostly changing it.

Ben Amitay Feb 7, 2023, 7:08 PM
LW: 1 AF: 1
−1
AF
in reply to: Michele Campolo’s comment on: On value in humans, other animals, and AI
That was good for my understanding of your position. My main problem with the whole thing though is in the use the word “bad”. I think it should be taboo at least until we establish a shared meaning.

Specifically, I think that most observers will find the first argument more logical than the second, because of a fallacy in using the word “bad”. I think that we learn that word in a way that is deeply entangled with power reward mechanism, to the point that it is mostly just a pointer to negative reward, things that we want to avoid, things that made our parents angry… In my view, the argument is than basically:

I want to avoid my suffering, and now generally person p want to avoid person p suffering. Therfore suffering is “to be avoided” in general, therefore suffering is “thing my parents will punish for”, therefore avoid creating suffering.

When written that way, it doesn’t seem more logical than is opposite.

Ben Amitay Feb 7, 2023, 2:10 PM
LW: 1 AF: 1
0
AF
in reply to: Michele Campolo’s comment on: On value in humans, other animals, and AI
Let me clarify that I don’t argue from agreement per say. I care about the underlying epistemic mechanism of agreement, that I claim to also be the mechanism of correctness. My point is that I don’t see similar epistemic mechanism in the case of morality.

Of course, emotions are verifiable states of brains. And the same goes for preferring actions that would lead to certain emotions and not others. It is a verifiable fact that you like chocolate. It is a contingent property of my brain that I care, but I don’t see what sort of argument that it is correct for me too care could even in principle be inherntly compelling.

Ben Amitay Feb 7, 2023, 10:07 AM
LW: 1 AF: 1
0
AF
in reply to: Michele Campolo’s comment on: On value in humans, other animals, and AI
I meant the first question in a very pragmatic way: what is it that you are trying to say when you say that something is good? What information does it represent?

It would be clearer in analogy to factual claims: we can do lots of philosophy about the exact meaning of saying that I have a dog, but in the end we share an objective reality in which there are real particles (or wave function approximately decomposable to particles or whatever) organized in patterns, that give rise to patterns of interaction with our senses that we learn to associate with the word “dog”. That latent shared reality ultimately allow us to talk about dogs, and check whether there is a dog in my house, and usually agree about the result. Every reflection and generalization that we do is ultimately about that, and can achieve something meaningful because of that.

I do not see the analogous story for moral reflection.

Ben Amitay

Agents synchronization

[Question] Train­ing for cor­ri­ga­bil­ity: ob­vi­ous prob­lems?

[Question] Training for corrigability: obvious problems?