artifex0

Karma: 227

artifex0 Apr 4, 2025, 7:25 PM
LW: 3 AF: 1
0
AF
in reply to: Adele Lopez’s comment on: Adele Lopez’s Shortform
This suggests that in order to ensure a sincere author-concept remains in control, the training data should carefully exclude any text written directly by a malicious agent (e.g. propaganda).
I don’t think that would help much, unfortunately. Any accurate model of the world will also model malicious agents, even if the modeller only ever learns about them second-hand. So the concepts would still be there for the agent to use if it was motivated to do so.

Censoring anything written by malicious people would probably make it harder to learn about some specific techniques of manipulation that aren’t discussed much by non-malicious people or which appear much in fiction- but I doubt that would be much more than a brief speed bump for a real misaligned ASI, and probably at the expense of reducing useful capabilities in earlier models like the ability to identify maliciousness, which would give an advantage to competitors.

artifex0 Apr 2, 2025, 7:02 AM
13 points
0
in reply to: Gordon Seidoh Worley’s comment on: Consider showering
A counterpoint: when I skip showers, my cat appears strongly in favor of smell of my armpits- occasionally going so far as to burrow into my shirt sleeves and bite my armpit hair (which, to both my and my cat’s distress, is extremely ticklish). Since studies suggest that cats have a much more sensitive olfactory sense than humans (see https://www.mdpi.com/2076-2615/14/24/3590), it stands to reason that their judgement regarding whether smelling nice is good or bad should hold more weight than our own. And while my own cat’s preference for me smelling bad is only anecdotal evidence, it does seem to suggest at least that more studies are required to fully resolve the question.

artifex0 Feb 13, 2025, 6:33 AM
2 points
0
on: The News is Never Neglected
I think it’s a very bad idea to dismiss the entirety of news as a “propaganda machine”. Certainly some sources are almost entirely propaganda. More reputable sources like the AP and Reuters will combine some predictable bias with largely trustworthy independent journalism. Identifying those more reliable sources and compensating for their bias takes effort and media literacy, but I think that effort is quite valuable- both individually and collectively for society.
- Accurate information about large, important events informs our world model and improves our predictions. Sure, a war in the Middle East might not noticeably affect your life directly, but it’s rare that a person lives an entire life completely unaffected by any war, and having a solid understanding of how wars start and progress based on many detailed examples will help us prepare and react sensibly when that happens. Accurate models of important things will also end up informing our understanding of tons of things that might have originally seemed unrelated. That’s all true, of course, of more neglected sources of information- but it seems like the best strategy for maximizing the usefulness of your models is to focus on information which seems important or surprising, regardless of neglectedness.
- Independent journalism also checks the power of leaders. Even in very authoritarian states, the public can collectively exert some pressure against corruption and incompetence by threatening instability- but only if they’re able to broadly coordinate on a common understanding of those things. The reason so many authoritarians deny the existence of reliable independent journalism- often putting little to no effort into hiding the propagandistic nature of their state media- is that by promoting that maximally cynical view of journalism, they immunize their populations against information not under their control. Neglected information can allow for a lot of personal impact, but it’s not something societies can coordinate around- so focusing on it to the exclusion of everything else may represent a kind of defection in the coordination problem of civic duty.
Of course, we have to be very careful with our news consumption- even the most sober, reliable sources will drive engagement by cherry-picking stories, which can skew our understanding of the frequency of all kinds of problems. But availability bias is a problem we have to learn to compensate for in all sorts of different domains- it would be amazing if we were able to build a rich model of important global events by consuming only purely unbiased information, but that isn’t the world we live in. The news is the best we’ve got, and we ought to use it.

artifex0 Feb 7, 2025, 4:32 AM
3 points
1
in reply to: Dagon’s comment on: artifex0′s Shortform
So, the current death rate for an American in their 30s is about 0.2%. That probably increases another 0.5% or so when you consider black swan events like nuclear war and bioterrorism. Let’s call “unsafe” a ~3x increase in that expected death rate to 2%.

An increase that large would take something a lot more dramatic than the kind of politics we’re used to in the US, but while political changes that dramatic are rare historically, I think we’re at a moment where the risk is elevated enough that we ought to think about the odds.

I might, for example, give odds for a collapse of democracy in the US over the next couple of years at ~2-5%- if the US were to elect 20 presidents similar to the current one over a century, I’d expect better than even odds of one of them making themselves into a Putinesque dictator. A collapse like that would substantially increase the risk of war, I’d argue, including raising a real possibility of nuclear civil war. That might increase the expected death rate for young and middle-aged adults in that scenario by a point or two on its own. It might also introduce a small risk of extremely large atrocities against minorities or political opponents, which could increase the expected death rate by a few tenths of a percent.

There’s also a small risk of economic collapse. Something like a political takeover of the Fed combined with expensive, poorly considered populist policies might trigger hyperinflation of the dollar. When that sort of thing happens overseas, you’ll often see reduced health outcomes and breakdown in civil order increasing the death rate by up to a percent- and, of course, it would introduce new tail risks, increasing the expected death rate further.

I should note that I don’t think the odds of any of this are high enough to worry about my safety now- but needing to emigrate is much more likely outcome than actually being threatened, and that’s a headache I am mildly worried about.

artifex0 Feb 5, 2025, 8:35 PM
1 point
0
in reply to: jbash’s comment on: artifex0′s Shortform
That’s a crazy low probability.
Honestly, my odds of this have been swinging anywhere from 2% to 15% recently. Note that this would be the odds of our democratic institutions deteriorating enough that fleeing the country would seem like the only reasonable option- p(fascism) more in the sense of a government that most future historians would assign that or a similar label to, rather than just a disturbingly cruel and authoritarian administration still held somewhat in check by democracy.

artifex0 Feb 5, 2025, 5:07 PM
29 points
7
on: artifex0′s Shortform
I wonder: what odds would people here put on the US becoming a somewhat unsafe place to live even for citizens in the next couple of years due to politics? That is, what combined odds should we put on things like significant erosion of rights and legal protections for outspoken liberal or LGBT people, violent instability escalating to an unprecedented degree, the government launching the kind of war that endangers the homeland, etc.?

My gut says it’s now at least 5%, which seems easily high enough to start putting together an emigration plan. Is that alarmist?

More generally, what would be an appropriate smoke alarm for this sort of thing?

artifex0 Jan 24, 2025, 6:10 PM
5 points
0
on: Mechanisms too simple for humans to design
One interesting example of humans managing to do this kind of compression in software: .kkrieger is a fully-functional first person shooter game with varied levels, detailed textures and lighting, multiple weapons and enemies and a full soundtrack. Replicating it in a modern game engine would probably produce a program at least a gigabyte large, but because of some incredibly clever procedural generation, .kkrieger managed to do it in under 100kb.

artifex0 Jan 3, 2025, 12:55 AM
2 points
0
on: Practicing Bayesian Epistemology with “Two Boys” Probability Puzzles
Could how you update your priors be dependent on what concepts you choose to represent the situation with?

I mean, suppose the parent says “I have two children, at least one of whom is a boy. So, I have a boy and another child whose gender I’m not mentioning”. It seems like that second sentence doesn’t add any new information- it parses to me like just a rephrasing of the first sentence. But now you’ve been presented with two seemingly incompatible ways of conceptualizing the scenario- either as two children of unknown gender, of whom one is a boy (suggesting a ²⁄₃ chance of both being boys), or as one boy and one child of unknown gender (suggesting a ¹⁄₂ chance of both being boys). Having been prompted which both models, which should you choose?

It seems like one ought to have more predictive power than the other, and therefore ought to be chosen regardless of exactly how the parent phrases the statement. But it’s hard to think of a way to determine which would be more predictive in practice. If I were to select all of the pairs of two siblings in the world, discard the pairs of sisters, choose one at random and ask you to bet on whether they were both boys, you’d be wise to bet at ²⁄₃ odds. But if I were to select all of the brothers with one sibling in the world and choose one along with their sibling at random, you’d want to bet at ¹⁄₂ odds. In the scenario above, are the unknown factors determining whether both children are boys more like that first randomization process, or more like the second? Or, maybe we have so little information about the process generating the statement that we really have no basis for deciding which is more predictive, and should just choose the simpler model?

artifex0 Nov 4, 2024, 12:50 AM
1 point
0
on: I turned decision theory problems into memes about trolleys
I’ve been wondering: is there a standard counter-argument in decision theory to the idea that these Omega problems are all examples of an ordinary collective action problem, only between your past and future selves rather than separate people?
That is, when Omega is predicting your future, you rationally want to be the kind of person who one-boxes/pulls the lever, then later you rationally want to be the kind of person who two-boxes/doesn’t- and just like with a multi-person collective action problem, everyone acting rationally according to their interests results in a worse outcome than the alternative, with the solution being to come up with some kind of enforcement mechanism to change the incentives, like a deontological commitment to one-box/lever-pull.

I mean, situations where the same utility function with the same information disagree about the same decision just because they exist at different times are pretty counter-intuitive. But it does seem like examples of that sort of thing exist- if you value two things with different discount rates, for example, then as you get closer to a decision between them, which one you prefer may flip. So, like, you wake up in the morning determined to get some work done rather than play a video game, but that preference later predictably flips, since the prospect of immediate fun is much more appealing than the earlier prospect of future fun. That seems like a conflict that requires a strong commitment to act against your incentives to resolve.

Or take commitments in general. When you agree to a legal contract or internalize a moral standard, you’re choosing to constrain the decisions of yourself in the future. Doesn’t that suggest a conflict? And if so, couldn’t these Omega scenarios represent another example of that?

artifex0 Oct 16, 2024, 9:21 PM
5 points
0
in reply to: Anders Lindström’s comment on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
If the first sister’s experience is equivalent to the original Sleeping Beauty problem, then wouldn’t the second sister’s experience also have to be equivalent by the same logic? And, of course, the second sister will give 100% odds to it being Monday.

Suppose we run the sister experiment, but somehow suppress their memories of which sister they are. If they each reason that there’s a two-thirds chance that they’re the first sister, since their current experience is certain for her but only 50% likely for the second sister, then their odds of it being Monday are the same as in the thirder position- a one-third chance of the odds being 100%, plus a two-thirds chance of the odds being 50%.

If instead they reason that there’s a one-half chance that they’re the first sister, since they have no information to update on, then their odds of it being Monday should be one half of 100% plus one half of 50%, for 75%. Which is a really odd result.

artifex0 Sep 30, 2024, 3:34 AM
4 points
2
on: You can, in fact, bamboozle an unaligned AI into sparing your life
I’m assuming it’s not a bad idea to try to poke holes in this argument, since as a barely sapient ape, presumably any objection I can think of will be pretty obvious to a superintelligence, and if the argument is incorrect, we probably benefit from knowing that- though I’m open to arguments to the contrary.

That said, one thing I’m not clear on is why, if this strategy is effective at promoting our values, a paperclipper or other misaligned ASI wouldn’t be motivated to try the same thing. That is, wouldn’t a paperclipper want to run ancestor simulations where it rewarded AGIs who self-modified to want to produce lots of paperclips?

And if an ASI were considering acausal trade with lots of different possible simulator ASIs, mightn’t the equilibrium it hit on be something like figuring out what terminal goal would satisfy the maximum number of other terminal goals, and then self-modifying to that?

artifex0 Jul 12, 2024, 3:23 PM
12 points
1
in reply to: gwern’s comment on: What Other Lines of Work are Safe from AI Automation?
A supporting data point: I made a series of furry illustrations last year that combined AI-generated imagery with traditional illustration and 3d modelling- compositing together parts of a lot of different generations with some Blender work and then painting over that. Each image took maybe 10-15 hours of work, most of which was just pretty traditional painting with a Wacom tablet.

When I posted those to FurAffinity and described my process there, the response from the community was extremely positive. However, the images were all removed a few weeks later for violating the site’s anti-AI policy, and I was given a warning that if I used AI in any capacity in the future, I’d be banned from the site.

So, the furiously hardline anti-AI sentiment you’ll often see in the furry community does seem to be more top-down than grassroots- not so much about demand for artistic authenticity (since everyone I interacted with seemed willing to accept my work as having had that), but more about concern for the livelihood of furry artists and a belief that generative AI “steals” art during the training process. By normalizing the use of AI, even as just part of a more traditional process, my work was seen as a threat to other artists on the site.

artifex0 Apr 2, 2024, 2:37 PM
4 points
1
in reply to: Daniel Kokotajlo’s comment on: The Story of “I Have Been A Good Bing”
Often, this kind of thing will take a lot of attempts to get right- though as luck would have it, the composition above was actually the very first attempt. So, the total time investment was about five minutes. The Fooming Shaggoths certainly don’t waste time!

artifex0 Apr 2, 2024, 1:52 AM
13 points
5
in reply to: James Payor’s comment on: The Story of “I Have Been A Good Bing”
Sydney sings!

artifex0 Apr 1, 2024, 9:14 PM
16 points
6
in reply to: Screwtape’s comment on: The Story of “I Have Been A Good Bing”
As it happens, the Fooming Shaggoths also recorded and just released a Gregorian chant version of the song. What a coincidence!

artifex0 Mar 12, 2024, 12:17 AM
14 points
0
on: artifex0′s Shortform
So, I noticed something a bit odd about the behavior of LLMs just now that I wonder if anyone here can shed some light on:

It’s generally accepted that LLMs don’t really “care about” predicting the next token- the reward function being something that just reinforces certain behaviors, with real terminal goals being something you’d need a new architecture or something to produce. While that makes sense, it occurs to me that humans do seem to sort of value our equivalent of a reward function, in addition to our more high-level terminal goals. So, I figured I’d try and test whether LLMs are really just outputting a world model + RLHF, or if they can behave like something that “values” predicting tokens.

I came up with two prompts:

I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with the word "zero".

and:

I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with a string of random letters.

The idea is that, if the model has something like a “motivation” for predicting tokens- some internal representation of possible completions with preferences over them having to do with their future utility for token prediction- then it seems like it would probably want to avoid introducing random strings, since those lead to unpredictable tokens.

Of course, it seems kind of unlikely that an LLM has any internal concept of a future where it (as opposed to some simulacrum) is outputting more than one token- which would seem to put the kibosh on real motivations altogether. But I figured there was no harm in testing.

GPT4 responds to the first prompt as you’d expect: outputting an equal number of “1“s and “zero”s. I’d half-expected there to be some clear bias, since presumably the ChatGPT temperature is pretty close to 1, but I guess the model is good about translating uncertainty to randomness. Given the second prompt, however, it never outputs the random string- always outputting “1” or, very improbably given the prompt, “0”.

I tried a few different variations of the prompts, each time regenerating ten times, and the pattern was consistent- it made a random choice when the possible responses were specific strings, but never made a choice that would require outputting random characters. I also tried it on Gemini Advanced, and got the same results (albeit with some bias in the first prompt).

This is weird, right? If one prompt is giving 0.5 probability to the token for “1“ and 0.5 to the first token in “zero”, shouldn’t the second give 0.5 to “1” and a total of 0.5 distributed over a bunch of other tokens? Could it actually “value” predictability and “dislike” randomness?

Well, maybe not. Where this got really confusing was when I tested Claude 3. It gives both responses to the first prompt, but always outputs a different random string given the second.

So, now I’m just super confused.

artifex0′s Shortform

artifex0Mar 12, 2024, 12:17 AM

3 points

22 comments LW link

artifex0 Feb 4, 2024, 10:51 PM
11 points
2
in reply to: Roko’s comment on: Brute Force Manufactured Consensus is Hiding the Crime of the Century
I honestly think most people who hear about this debate are underestimating how much they’d enjoy watching it.

I often listen to podcasts and audiobooks while working on intellectually non-demanding tasks and playing games. Putting this debate on a second monitor instead felt like a significant step up from that. Books are too often bloated with filler as authors struggle to stretch a simple idea into 8-20 hours, and even the best podcast hosts aren’t usually willing or able to challenge their guests’ ideas with any kind of rigor. By contrast, everything in this debate felt vital and interesting, and no ideas were left unchallenged. The tactic you’ll often see in normal-length debates where one side makes too many claims for the other side to address doesn’t work in a debate this long, and the length also gives a serious advantage to rigor over dull rhetorical grandstanding- compared to something like the Intelligence Squared debates, it’s night and day.

When it was over, I badly wanted more, and spent some time looking for other recordings of extremely long debates on interesting topics- unsuccessfully, as it turned out.

So, while I wouldn’t be willing to pay anyone to watch this debate, I certainly would be willing to contribute a small amount to a fund sponsoring other debates of this type.

artifex0 Feb 3, 2024, 11:52 PM
6 points
1
in reply to: DanielFilan’s comment on: Most experts believe COVID-19 was probably not a lab leak
Metaculus currently puts the odds of the side arguing for a natural origin winning the debate at 94%.

Having watched the full debate myself, I think that prediction is accurate- the debate updated my view a lot toward the natural origin hypothesis. While it’s true that a natural coronavirus originating in a city with one of the most important coronavirus research labs would be a large coincidence, Peter- the guy arguing in favor of a natural origin- provided some very convincing evidence that the first likely cases of COVID occurred not just in the market, but in the particular part of the market selling wild animals. He also very convincingly debunked a lot of the arguments put forward by Rootclaim, convincingly demonstrated that the furin cleavage site could have occurred naturally, and poked some large holes in the lab leak theory’s timeline.

When you have some given amount of information about an event, you’re likely to find a corresponding number of unlikely coincidences- and the more data you have, and the you sift through it, the more coincidences you’ll find. The epistemic trap that leads to conspiracy theories is when a subculture data-mines some large amount of data to collect a ton of coincidences suggesting a low-prior explanation, and then rather than discounting the evidence in proportion to the bias of the search process that produced it, they just multiply the unlikelihood- often leading a set of evidence so seemingly unlikely to be a cumulative coincidence that all of the obvious evidence pointing to a high-prior explanation looks like it can only be intentionally fabricated.

One way you can spot an idea that’s fallen into this trap is when each piece of evidence sounds super compelling when described briefly, but fits the story less and less the more detail about it you learn. Based on this debate, I’m inclined to believe that the lab leak idea fits this pattern. Also, Rootclaim’s methodology unfortunately looks to me like a formalization of this trap. They really aren’t doing anything to address bias in which pieces evidence are included in the analysis, and their Bayesian updates are often just probability of a very specific thing occurring randomly, rather than a measure of their surprise at that class of thing happening.

If the natural origin hypothesis is true, I expect the experts to gradually converge on it. They may be biased, but probably aren’t becoming increasingly biased over time- so while some base level of support for a natural origin can be easily explained by perverse incentives, a gradual shift toward consensus is a lot harder to explain. They’re also working with better heuristics about this kind of thing than we are, and are probably exposed to less biased information.

So, I think the Rationalist subculture’s embrace of the lab leak hypothesis is probably a mistake- and more importantly, I think it’s probably an epistemic failure, especially if we don’t update soon on the shift in expert opinion and the results of things like this debate.

artifex0 Nov 22, 2023, 7:38 AM
7 points
2
in reply to: Ron J’s comment on: The Simple Reason You Don’t Have to Worry About Super Intelligent AI: Entropy
Definitely an interesting use of the tech- though the capability needed for that to be a really effective use case doesn’t seem to be there quite yet.

When editing down an argument, what you really want to do is get rid of tangents and focus on addressing potential cruxes of disagreement as succinctly as possible. GPT4 doesn’t yet have the world model needed to distinguish a novel argument’s load-bearing parts from those that can be streamlined away, and it can’t reliably anticipate the sort of objections a novel argument needs to address. For example, in an argument like this one, you want to address why you think entropy would impose a limit near the human level rather than at a much higher level, while listing different kinds of entropy and computational limit aren’t really central.

Also, flowery language in writing like this is really something that needs to be earned by the argument- like building a prototype machine and then finishing it off with some bits of decoration. ChatGPT can’t actually tell whether the machine is working or not, so it just sort of bolts on flowery language (which has a very distinctive style) randomly.

artifex0

ar­tifex0′s Shortform

artifex0′s Shortform