Kenoubi

Karma: 360

Kenoubi Jul 11, 2023, 4:08 AM
3 points
on: Bayesianism versus conservatism versus Goodhart
It seems like there might be a problem with this argument if the true $v_{i}$ are not just unknown, but adversarially chosen. For example, suppose the true $v_{2}$ are the actual locations of a bunch of landmines, from a full set of possible landmine positions $V_{2}$ . We are trying to get a vehicle from A to B, and all possible paths go over some of the $V_{2}$ . We may know that the opponent placing the landmines only has $n_{2}$ landmines to place. Furthermore, suppose each landmine only goes off with some probability $p$ even if the vehicle drives over it. If we can mechanistically predict where the opponent placed the landmines, or even mechanistically derive a probability distribution over the landmine placements, this is no problem, we can just use that to minimize the expected probability of driving over a landmine that goes off. However, suppose we can’t predict the opponent that way, but we do know the opponent is trying to maximize the probability that the vehicle drives over a landmine that isn’t a dud. It seems like we need to use game theory here, not just probability theory, to figure out what mixed strategy the opponent would be using to maximize the probability that we drive over a landmine, and then use that game-theoretic strategy to choose a mixed strategy for which path to take. It seems like the game theory here involves a step where we look for the worst (according to our utility function) probability distribution over where the landmines are placed, because this is how the opponent will have actually chosen where to put the landmines. Doesn’t this look a lot like using $μ$ rather than $U$ as our utility function?

Kenoubi Jun 29, 2023, 1:57 PM
3 points
0
on: Deceptive AI vs. shifting instrumental incentives
I like this frame, and I don’t recall seeing it already addressed.

What I have seen written about deceptiveness generally seems to assume that the AGI would be sufficiently capable of obfuscating its thoughts from direct queries and from any interpretability tools we have available that it could effectively make its plans for world domination in secret, unobserved by humans. That does seem like an even more effective strategy for optimizing its actual utility function than not bothering to think through such plans at all, if it’s able to do it. But it’s hard to do, and even thinking about it is risky.

I can imagine something like what you describe happening as a middle stage, for entities that are agentic enough to have (latent, probably misaligned since alignment is probably hard) goals, but not yet capable enough to think hard about how to optimize for them without being detected. It seems more likely if (1) almost all sufficiently powerful AI systems created by humans will actually have misaligned goals, (2) AIs are optimized very hard against having visibly misaligned cognition (selection of which AIs to keep being a form of optimization, in this context), and (3) our techniques for making misaligned cognition visible are more reliably able to detect an active process / subsystem doing planning towards goals than the mere latent presence of such goals. (3) seems likely, at least for a while and assuming we have any meaningful interpretability tools at all; it’s hard for me to imagine a detector of latent properties that doesn’t just always say “well, there are some off-distribution inputs that would make it do something very bad” for every sufficiently powerful AI, even one that was aligned-in-practice because those inputs would reliably never be given to it.

Kenoubi Jun 16, 2023, 2:28 PM
4 points
0
in reply to: Charlie Steiner’s comment on: Aligning Mathematical Notions of Infinity with Human Intuition
Hmm. My intuition says that your A and B are “pretty much the same size”. Sure, there are infinitely many times that they switch places, but they do so about as regularly as possible and they’re always close.

If A is “numbers with an odd number of digits” and B is “numbers with an even number of digits” that intuition starts to break down, though. Not only do they switch places infinitely often, but the extent to which one exceeds the other is unbounded. Calling A and B “pretty much the same size” starts to seem untenable; it feels more like “the concept of being bigger or smaller or the same size doesn’t properly apply to the pair of A and B”. (Even though A and B are well defined, not THAT hard to imagine, and mathematicians will still say they’re the same size!)

If A is “numbers whose number of digits is a multiple of 10”, and B is all the other (positive whole) numbers, then… I start to intuitively feel like B is bigger again??? I think this is probably just my intuition not being able to pay attention to all the parts of the question at the same time, and thus substituting “are there more multiples of 10 or non-multiples”, which then works the way you said.

Kenoubi May 31, 2023, 8:40 PM
1 point
4
in reply to: Sune’s comment on: You now can enable reacts on your own posts! (and other react info)
I think this comment demonstrates that the list of reacts should wrap, not extend arbitrarily far to the right.

Kenoubi May 31, 2023, 8:37 PM
2 points
0
in reply to: Sune’s comment on: You now can enable reacts on your own posts! (and other react info)
The obvious way to quickly and intuitively illustrate whether reactions are positive or negative would seem to be color; another option would be grouping them horizontally or vertically with some kind of separator. The obvious way to quickly and intuitively make it visible which reactions were had by more readers would seem to be showing a copy of the same icon for each person who reacted a certain way, not a number next to the icon.

I make no claim that either of these changes would be improvements overall. Clearly the second would require a way to handle large numbers of reactions to the same comment. The icons could get larger or smaller depending on number of that reaction, but small icons would get hard to recognize. Falling back to numbers isn’t great either, since it’s exactly in the cases where that fallback would happen that the number of a particular reaction has become overwhelmingly high.

I think it matters that there are a lot of different reactions possible compared to, say, Facebook, and at the same time, unlike many systems with lots of different reactions, they aren’t (standard Unicode) emoji, so you don’t get to just transfer existing knowledge of what they mean. And they have important semantic (rather than just emotive) content, so it actually matters if one can quickly tell what they mean. And they partially but not totally overlap with karma and agreement karma; it seems a bit inelegant and crowded to have both, but there are benefits that are hard to achieve with only one. It’s a difficult problem.

Kenoubi May 24, 2023, 8:37 PM
5 points
4
on: Open Thread With Experimental Feature: Reactions
In the current UI, the list of reactions from which to choose is scrollable, but that’s basically impossible to actually see. While reading the comments I was wondering what the heck people were talking about with “Strawman” and so forth. (Like… did that already get removed?) Then I discovered the scrolling by accident after seeing a “Shrug” reaction to one of the comments.

Kenoubi Mar 27, 2023, 11:45 AM
2 points
0
on: The default outcome for aligned AGI still looks pretty bad
I’ve had similar thoughts. Two counterpoints:
- This is basically misuse risk, which is not a weird problem that people need to be convinced even needs solving. To the extent AI appears likely to be powerful, society at large is already working on this. Of course, its efforts may be ineffective or even counterproductive.
- They say power corrupts, but I’d say power opens up space to do what you were already inclined to do without constraints. Some billionaires, e.g. Bill Gates, seem to be sincerely trying to use their resources to help people. It isn’t hard for me to imagine that many people, if given power beyond what they can imagine, would attempt to use it to do helpful / altruistic things (at least, things they themselves considered helpful / altruistic).
I don’t in any sense think either of these are knockdowns, and I’m still pretty concerned about how controllable AI systems (whether that’s because they’re aligned, or just too weak and/or insufficiently agentic) may be used.

Kenoubi Mar 1, 2023, 1:23 AM
8 points
6
in reply to: Noosphere89’s comment on: Enemies vs Malefactors

On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.

I don’t deny that the cryptocurrency “industry” has been a huge magnet for fraud, nor that there are structural reasons for that, but “there was nothing real about crypto” is plainly false. The desire to have currencies that can’t easily be controlled, manipulated, or implicitly taxed (seigniorage, inflation) by governments or other centralized organizations and that can be transferred without physical presence is real. So is the desire for self-executing contracts. One might believe those to be harmful abilities that humanity would be better off without, but not that they’re just nothing.

Kenoubi Feb 23, 2023, 5:07 PM
1 point
−1
on: Covid 2/23/23: Your Best Possible Situation
Thank you for writing these! They’ve been practically my only source of “news” for most of the time you’ve been writing them, and before that I mostly just ignored “news” entirely because I found it too toxic and it was too difficult+distasteful to attempt to decode it into something useful. COVID the disease hasn’t directly had a huge effect on my life, and COVID the social phenomenon has been on a significant decline for some time now, but your writing about it (and the inclusion of especially notable non-COVID topics) have easily kept me interested enough to keep reading. Please consider continuing some kind of post on a weekly cadence. I think it’s a really good frequency to never lose touch but also not be too burdensome (to the reader or the writer).

Kenoubi Feb 16, 2023, 12:51 PM
1 point
0
in reply to: Shmi’s comment on: ChatGPT is a SPAM Factory
I found it to be a pretty obvious reference to the title. SPAM is a meatcube. A meatcube is something that has been processed into uniformity. Any detectable character it had, whether faults, individuality, or flashes of brilliance, has been ground, blended, and seasoned away.

Kenoubi Dec 14, 2022, 2:05 AM
2 points
0
in reply to: Ustice’s comment on: Is the ChatGPT-simulated Linux virtual machine real?
I don’t know how far a model trained explicitly on only terminal output could go, but it makes sense that it might be a lot farther than a model trained on all the text on the internet (some small fraction of which happens to be terminal output). Although I also would have thought GPT’s architecture, with a fixed context window and a fixed number of layers and tokenization that isn’t at all optimized for the task, would pay large efficiency penalties at terminal emulation and would be far less impressive at it than it is at other tasks.

Assuming it does work, could we get a self-operating terminal by training another GPT to roleplay the entering commands part? Probably. I’m not sure we should though...

Kenoubi Dec 14, 2022, 1:49 AM
2 points
1
in reply to: Lone Pine’s comment on: Is the ChatGPT-simulated Linux virtual machine real?
Sure, I understood that’s what was being claimed. Roleplaying a Linux VM without error seemed extremely demanding relative to other things I knew LLMs could do, such that it was hard for me not to question whether the whole thing was just made up.

Kenoubi Dec 13, 2022, 4:43 PM
3 points
0
in reply to: Radford Neal’s comment on: Is the ChatGPT-simulated Linux virtual machine real?
Thanks! This is much more what I expected. Things that look generally like outputs that commands might produce, and with some mind-blowing correct outputs (e.g. the effect of tr on the source code) but also some wrong outputs (e.g. the section after echo A >a; echo X >b; echo T >c; echo H >d; the output being consistent between cat a a c b d d and cat a a c b d d | sort (but inconsistent with the “actual contents” of the files) is especially the kind of error I’d expect an LLM to make).

[Question] Is the ChatGPT-simulated Linux virtual machine real?

KenoubiDec 13, 2022, 3:41 PM

18 points

7 comments1 min readLW link

Kenoubi Dec 2, 2022, 6:29 PM
1 point
0
in reply to: Jozdien’s comment on: [ASoT] Finetuning, RL, and GPT’s world prior
That works too!

Kenoubi Dec 2, 2022, 6:25 PM
1 point
0
in reply to: Jozdien’s comment on: [ASoT] Finetuning, RL, and GPT’s world prior
Got it. This post also doesn’t appear to actually be part of that sequence though? I would have noticed if it was and looked at the sequence page.

EDIT: Oh, I guess it’s not your sequence.

EDIT2: If you just included “Alignment Stream of Thought” as part of the link text in your intro where you do already link to the sequence, that would work.

Kenoubi Dec 2, 2022, 6:14 PM
1 point
0
on: [ASoT] Finetuning, RL, and GPT’s world prior

ASoT

What do you mean by this acronym? I’m not aware of its being in use on LW, you don’t define it, and to me it very definitely (capitalization and all) means Armin van Buuren’s weekly radio show A State of Trance.

Kenoubi Nov 3, 2022, 6:21 PM
1 point
0
on: Why do we post our AI safety plans on the Internet?
Counterpoint #2a: A misaligned AGI whose capabilities are high enough to use our safety plans against us will succeed with an equal probability (e.g., close to 100%), if necessary by accessing these plans whether or not they were posted to the Internet.

Kenoubi Oct 6, 2022, 3:05 PM
3 points
2
in reply to: TekhneMakre’s comment on: Humans aren’t fitness maximizers
If only relative frequency of genes matters, then the overall size of the gene pool doesn’t matter. If the overall size of the gene pool doesn’t matter, then it doesn’t matter if that size is zero. If the size of the gene pool is zero, then whatever was included in that gene pool is extinct.

Yes, it’s true people make all kinds of incorrect inferences because they think genes that increase the size of the gene pool will be selected for or those that decrease it will be selected against. But it’s still also true that a gene that reduces the size of the pool it’s in to zero will no longer be found in any living organisms, regardless of what its relative frequency was in the process of the pool reaching a size of zero. If the term IGF doesn’t include that, that just means IGF isn’t a complete way of accounting for what organisms we observe to exist in what frequencies and how those change over time.

Kenoubi Sep 25, 2022, 11:12 PM
3 points
0
in reply to: AnthonyC’s comment on: Covid 9/22/22: The Joe Biden Sings
I mean, just lag, yes, but there’s also plain old incorrect readings. But yes, it would be cool to have a system that incorporated glucagon. Though, diabetics’ body still produce glucagon AFAIK, so it’d really be better to just have something that senses glucose and releases insulin the same way a working pancreas would.

Kenoubi

[Question] Is the ChatGPT-simu­lated Linux vir­tual ma­chine real?

[Question] Is the ChatGPT-simulated Linux virtual machine real?