Abhimanyu Pallavi Sudhir

Karma: 83

CS PhD student

Abhimanyu Pallavi Sudhir 10 May 2024 14:25 UTC
2 points
0
in reply to: Dagon’s comment on: Abhimanyu Pallavi Sudhir’s Shortform
Oh right, lol, good point.

Abhimanyu Pallavi Sudhir 9 May 2024 15:02 UTC
1 point
0
on: Abhimanyu Pallavi Sudhir’s Shortform
I used to have an idea for a karma/reputation system: repeatedly recalculate karma weighted by the karma of the upvoters and downvoters on a comment (then normalize to avoid hyperinflation) until a fixed point is reached.

I feel like this is vaguely somehow related to:
- AlphaGoZero
- Humans Consulting HCH
- Wealth in markets

Abhimanyu Pallavi Sudhir 8 May 2024 12:23 UTC
3 points
0
in reply to: johnswentworth’s comment on: Some Experiments I’d Like Someone To Try With An Amnestic
it’s extremely high immediate value—it solves IP rights entirely.

It’s the barbed wire for IP rights

Abhimanyu Pallavi Sudhir 1 May 2024 22:46 UTC
2 points
0
on: Abhimanyu Pallavi Sudhir’s Shortform
quick thoughts on LLM psychology
LLMs cannot be directly anthromorphized. Though something like “a program that continuously calls an LLM to generate a rolling chain of thought, dumps memory into a relational database, can call from a library of functions which includes dumping to recall from that database, receives inputs that are added to the LLM context” is much more agent-like.
Humans evolved feelings as signals of cost and benefit — because we can respond to those signals in our behaviour.

These feelings add up to a “utility function”, something that is only instrumentally useful to the training process. I.e. you can think of a utility function as itself a heuristic taught by the reward function.

LLMs certainly do need cost-benefit signals about features of text. But I think their feelings/utility functions are limited to just that.

E.g. LLMs do not experience the feeling of “mental effort”. They do not find some questions harder than others, because the energy cost of cognition is not a useful signal to them during the training process (I don’t think regularization counts for this either).

LLMs also do not experience “annoyance”. They don’t have the ability to ignore or obliterate a user they’re annoyed with, so annoyance is not a useful signal to them.
Ok, but aren’t LLMs capable of simulating annoyance? E.g. if annoying questions are followed by annoyed responses in the dataset, couldn’t LLMs learn to experience some model of annoyance so as to correctly reproduce the verbal effects of annoyance in its response?

More precisely, if you just gave an LLM the function ignore_user() in its function library, it would run it when “simulating annoyance” even though ignoring the user wasn’t useful during training, because it’s playing the role.

I don’t think this is the same as being annoyed, though. For people, simulating an emotion and feeling it are often similar due to mirror neurons or whatever, but there is no reason to expect this is the case for LLMs.

Abhimanyu Pallavi Sudhir 28 Apr 2024 22:40 UTC
4 points
−4
on: Abhimanyu Pallavi Sudhir’s Shortform
current LLMs vs dangerous AIs

Most current “alignment research” with LLMs seems indistinguishable from “capabilities research”. Both are just “getting the AI to be better at what we want it to do”, and there isn’t really a critical difference between the two.

Alignment in the original sense was defined oppositionally to the AI’s own nefarious objectives. Which LLMs don’t have, so alignment research with LLMs is probably moot.

something related I wrote in my MATS application:
1. I think the most important alignment failure modes occur when deploying an LLM as part of an agent (i.e. a program that autonomously runs a limited-context chain of thought from LLM predictions, maintains a long-term storage, calls functions such as search over storage, self-prompting and habit modification either based on LLM-generated function calls or as cron-jobs/hooks).
2. These kinds of alignment failures are (1) only truly serious when the agent is somehow objective-driven or equivalently has feelings, which current LLMs have not been trained to be (I think that would need some kind of online learning, or learning to self-modify) (2) can only be solved when the agent is objective-driven.

Abhimanyu Pallavi Sudhir 27 Apr 2024 21:02 UTC
2 points
2
on: Abhimanyu Pallavi Sudhir’s Shortform
conditionalization is not the probabilistic version of implies

P Q Q| P P → Q

T T T T

T F F F

F T N/A T

F F N/A T

Resolution logic for conditionalization: Q if P or True

Resolution logic for implies: Q if P or None

P	Q	Q\| P	P → Q
T	T	T	T
T	F	F	F
F	T	N/A	T
F	F	N/A	T

Abhimanyu Pallavi Sudhir’s Shortform

Abhimanyu Pallavi Sudhir27 Apr 2024 21:02 UTC

2 points

7 comments1 min readLW link

Betting on what is un-falsifiable and un-verifiable

Abhimanyu Pallavi Sudhir14 Nov 2023 21:11 UTC

13 points

0 comments14 min readLW link

Abhimanyu Pallavi Sudhir 13 Dec 2022 18:06 UTC
1 point
0
in reply to: mruwnik’s comment on: Meaningful things are those the universe possesses a semantics for
I think that the philosophical questions you’re describing actually evaporate and turn out to be meaningless once you think enough about them, because they have a very anthropic flavour.

Abhimanyu Pallavi Sudhir 13 Dec 2022 17:15 UTC
1 point
0
in reply to: mruwnik’s comment on: Meaningful things are those the universe possesses a semantics for
I don’t think that’s exactly true. But why do you think that follows from what I wrote?

Abhimanyu Pallavi Sudhir 13 Dec 2022 10:01 UTC
3 points
2
in reply to: the gears to ascension’s comment on: Meaningful things are those the universe possesses a semantics for
That’s syntax, not semantics.

Abhimanyu Pallavi Sudhir 13 Dec 2022 10:00 UTC
1 point
0
in reply to: shminux’s comment on: Meaningful things are those the universe possesses a semantics for
~~It’s really not, that’s the point I made about semantics.~~
Eh that’s kind-of right, my original comment there was dumb.

Abhimanyu Pallavi Sudhir 13 Dec 2022 8:45 UTC
1 point
−2
in reply to: shminux’s comment on: Meaningful things are those the universe possesses a semantics for
~~You overstate your case. The universe contains a finite amount of incompressible information, which is strictly less than the information contained in~~ $Z F + ω_{C K}$ ~~. That self-reference applies to the universe is obvious, because the universe contains computer programs.~~
The point is the universe is certainly a computer program, and that incompleteness applies to all computer programs (to all things with only finite incompressible information). In any case, I explained Godel with an explicitly empirical example, so I’m not sure what your point is.

Abhimanyu Pallavi Sudhir 13 Dec 2022 6:45 UTC
1 point
0
in reply to: shminux’s comment on: Meaningful things are those the universe possesses a semantics for
I agree, and one could think of this in terms of markets: a market cannot capture all information about the world, because it is part of the world.

But I disagree that this is fundamentally unrelated—here too the issue is that it would need to represent states of the world corresponding to what belief it expresses. Ultimately mathematics is supposed to represent the real world.

Meaningful things are those the universe possesses a semantics for

Abhimanyu Pallavi Sudhir12 Dec 2022 16:03 UTC

16 points

14 comments14 min readLW link

Abhimanyu Pallavi Sudhir 20 Aug 2020 4:16 UTC
2 points
in reply to: Dagon’s comment on: A way to beat superrational/EDT agents?
No, it doesn’t. There is no ¹⁄₄ chance of anything once you’ve found yourself in Room A1.
You do acknowledge that the payout for the agent in room B (if it exists) from your actions is the same as the payout for you from your own actions, which if the coin came up tails is $3, yes?

Abhimanyu Pallavi Sudhir 20 Aug 2020 4:11 UTC
1 point
in reply to: Dagon’s comment on: A way to beat superrational/EDT agents?
I don’t understand what you are saying. If you find yourself in Room A1, you simply eliminate the last two possibilities so the total payout of Tails becomes 6.
If you find yourself in Room A1, you do find yourself in a world where you are allowed to bet. It doesn’t make sense to consider the counterfactual, because you already have gotten new information.

Abhimanyu Pallavi Sudhir 19 Aug 2020 3:42 UTC
2 points
in reply to: Joachim Bartosik’s comment on: A way to beat superrational/EDT agents?
That’s not important at all. The agents in rooms A1 and A2 themselves would do better to choose tails than to choose heads. They really are being harmed by the information.

Abhimanyu Pallavi Sudhir 18 Aug 2020 6:20 UTC
1 point
in reply to: Charlie Steiner’s comment on: A way to beat superrational/EDT agents?
I see, that is indeed the same principle (and also simpler/we don’t need to worry about whether we “control” symmetric situations).

Abhimanyu Pallavi Sudhir 17 Aug 2020 16:52 UTC
2 points
in reply to: player_03’s comment on: A way to beat superrational/EDT agents?
I don’t think this is right. A superrational agent exploits the symmetry between A1 and A2, correct? So it must reason that an identical agent in A2 will reason the same way as it does, and if it bets heads, so will the other agent. That’s the point of bringing up EDT.

Abhimanyu Pallavi Sudhir

Ab­hi­manyu Pal­lavi Sud­hir’s Shortform

Bet­ting on what is un-falsifi­able and un-verifiable

Mean­ingful things are those the uni­verse pos­sesses a se­man­tics for

Abhimanyu Pallavi Sudhir’s Shortform

Betting on what is un-falsifiable and un-verifiable

Meaningful things are those the universe possesses a semantics for