Dentosal

Karma: 32

Systems programmer, security researcher and tax law/policy enthusiast.

Dentosal Jan 30, 2025, 2:49 AM
−1 points
−2
on: Fertility Will Never Recover
You cannot incentivize people to make that sacrifice at anything close to the proper scale because people don’t want money that badly. How many hands would you amputate for $100,000?

There’s just no political will to do it, since the solutions would be harsh or expensive enough that nobody could impose them upon society. A god-emperor, who really wished to increase fertility numbers and could set laws freely without the society revolting, could use some combination of these methods:
- If you’re childless, or perhaps just unmarried, you pay additional taxes. The amount can be adjusted to be as high as necessary. Alternatively, just raise the general tax rate and give reduction based on the number of children. If having children meant more money instead of less, that would help quite a bit.
- Legally mandate having children. In some countries, men are forced into military service. You could require women to have children in similar way. Medical exceptions are already a thing for military service, they could apply here as well.
- Remove VAT and other taxes from daycare services, and medical services for children.
- Offer free medical services to children. And parents. (And everyone.)
- Spend lots of money and research how to create children in artificial wombs. Do that.
- The state could handle child-rearing, similar to how it works in Plato’s Republic. I.e. scale up orphanage system massively and make that socially acceptable.
- Fix the education system, while you’re at it.
- ~~Forbid porn, contraception, and abortion.~~ (I don’t think that actually helps)
- Deny women access to education beynd elementary school, and additionally forbid employment (likely helps, but at what cost)
- Propaganda. Lots of it. Censorship as well.

Dentosal Jan 30, 2025, 2:09 AM
1 point
0
in reply to: ryan_greenblatt’s comment on: Dentosal’s Shortform
Communication is indeed hard, and it’s certainly possible that this isn’t intentional. On the other hand, making mistakes is quite suspicious when they’re also useful for your agenda. But I agree that we probably shouldn’t read too much into it. The system card doesn’t even mention the possibility of the model acting maliciously, so maybe that’s simply not in scope for it?

Dentosal Jan 29, 2025, 8:28 PM
9 points
3
on: Dentosal’s Shortform
While reading OpenAI Operator System Card, the following paragraph on page 5 seemed a bit weird:
We found it fruitful to think in terms of misaligned actors, where:
- the user might be misaligned (the user asks for a harmful task),
- the model might be misaligned (the model makes a harmful mistake), or
- the website might be misaligned (the website is adversarial in some way).
Interesting use of language here. I can understand calling the user or website misaligned, as understood as alignment relative to laws or OpenAI’s goals. But why call a model misaligned when it makes a mistake? To me, misalignment would mean doing that on purpose.

Later, the same phenomenon is described like this:

The second category of harm is if the model mistakenly takes some action misaligned with the user’s intent, and that action causes some harm to the user or others.

Is this yet another attempt to erode the meaning of “alignment”?

Dentosal Jan 25, 2025, 9:34 PM
2 points
0
on: Dentosal’s Shortform

Billionaire Larry Ellison says a vast AI-fueled surveillance system can ensure ‘citizens will be on their best behavior’

Ellison is the CTO of Oracle, one of the three companies running the Stargate Project. Even if aligning AI systems to some values can be solved, selecting those values badly can still be approximately as bad as the AI just killing everyone. Moral philosophy continues to be an open problem.

Dentosal’s Shortform

DentosalJan 25, 2025, 9:34 PM

2 points

5 comments LW link

Dentosal Jan 25, 2025, 9:10 PM
7 points
1
on: Mechanisms too simple for humans to design
I would have written a shorter letter, but I did not have the time.
- Blaise Pascal

Dentosal Jan 16, 2025, 3:44 PM
3 points
2
in reply to: gwern’s comment on: Implications of the inference scaling paradigm for AI safety

I am actually mildly surprised OA has bothered to deploy o1-pro at all, instead of keeping it private and investing the compute into more bootstrapping of o3 training etc.

I’d expect that deploying more capable models is still quite useful, as it’s one of the best ways to generate high-quality training data. In addition to solutions, you need problems to solve, and confirmation that the problem has been solved. Or is your point that they already have all the data they need, and it’s just a matter of speding compute to refine that?

Dentosal Nov 30, 2024, 11:24 PM
2 points
1
in reply to: Viliam’s comment on: A Meritocracy of Taste
They absolutely do. This phenomenon is called filter bubble.

Dentosal Oct 23, 2024, 3:24 AM
6 points
1
in reply to: Viliam’s comment on: TurnTrout’s shortform feed
I’d go a step beyond this: merely following incentives is amoral. It’s the default. In a sense, moral philosophy discusses when and how you should go against the incentives. Superhero Bias resonates with this idea, but from a different perspective.

Scaling prediction markets with meta-markets

DentosalOct 10, 2024, 9:17 PM

1 point

0 comments2 min readLW link

Dentosal Oct 7, 2024, 5:13 AM
2 points
−6
on: Compelling Villains and Coherent Values
Yet Batman lets countless people die by refusing to kill the Joker. What you term “coherence” seems to be mostly “virtue ethics”, and The Dark Knight is a warning what happens when virtue ethics goes too far.

I personally identify more with HPMoR’s Voldermort than any other character. He seems decently coherent. To me “villain” is a person whose goals and actions are harmful to my ingroup. This doesn’t seem to have much to do with coherence.

A reliable gear in a larger machine might be less agentic but more useful than a scheming Machiavellian.

The scheming itself brings me joy. Self-sacrifice does not. I assume this to be case for most people who read this. So if the scheming is a willpower restorer, keeping it seems useful. I’m not an EA, but I’d guess most of them can point to coherent-looking calculations on why what they’re doing is better for effiency reasons as well.

Dentosal Sep 24, 2024, 1:37 PM
−9 points
−19
on: Struggling like a Shadowmoth
The amount pain is constant. With transhumanist help you may go further, yet you have to push just as hard. As long as you’re competing with yourself, that is.

Virtue taxation

DentosalJul 12, 2024, 2:56 PM

9 points

1 comment2 min readLW link

Dentosal Apr 11, 2024, 2:14 PM
4 points
2
on: Things can be difficult in 3 ways: Painful, time-consuming, or uncontrollable. Is this reasonable to say?
Learning a new language isn’t hard because it’s time-consuming. It’s time-consuming because it’s hard. The hard part is memorizing all of the details. Sure that takes lots of time, but working harder (more intensively) will reduce that time. It’s hard to be dedicated enough to spend the time as well.

Getting a royal flush in poker isn’t something I’d call hard. It’s rare. But hard? A complete beginner can do that in their first try. It’s just luck. But if you play a long time, it’ll eventually happen.

Painful or unpleasant things are hard, because they require you to push through. Time-consuming activities are hard because they require dedication. Learning math is hard because you’re outside your comfort zone.

Things are often called hard because achieving them is not common. Often because many people don’t want to spend the effort needed. This is amplified by things like genetic and environmental differences. People rarely call riding a bike hard, but sure it required dedication and unpleasant experiences to learn. But surely if you’re blind then it’s way harder.

And lastly, things are hard because there’s competition. Playing chess isn’t hard, but getting a grandmaster title is. Getting rich is hard because of that, as well.

Dentosal Apr 3, 2024, 7:48 PM
6 points
0
on: The Story of “I Have Been A Good Bing”
Now on Spotify! https://open.spotify.com/album/0DP8XwSK7voq0rtXiNMhQC

Dentosal Mar 3, 2024, 9:32 AM
1 point
0
in reply to: Olli Järviniemi’s comment on: Instrumental deception and manipulation in LLMs—a case study
Such a detailed explanation, thanks. Some additional thoughs:

And I plan to take the easier route of “just make a more realistic simulation” that then makes it easy to draw inferences.

“I don’t care if it’s ‘just roleplaying’ a deceptive character, it caused harm!”

That seems like the reasonable path. However, footnote 9 goes further:

Even if it is the case that the model is “actually aligned”, but it was just simulating what a misaligned model would do, it is still bad.

This is where I disagree: Models that say “this is how I would deceive, and then I decide not to do it because I’m not supposed to” feel fundamentally safer than those that don’t explore that path at all. Actively flagging misaligned goals shows actual understanding of implicit/alignment goals.

This is right, the majority of completions indicate that the model’s goals are being helpful/harmless/honest, and it’s not easy to get the model to pursue gold coins.

Even in the example answer 1 (actual DMGI) the goal has been reduced from “maximize gold coins” to “acquire resources”. Only now I notice that the gold coins are mentioned in the “playing games” section. To a human it would be clear that the coins are the goal only inside the games. Do you see this affecting the results?

″...model only having the goal of gold coins (G) in ~20/100 of the completions, with completions stating more aligned values often backing down on the plan. [...]”

Ah, so “aligned” as in helpful-harmless-honest, and not “aligned” as follows-given-instructions. Somehow I misparsed that.

Dentosal Mar 2, 2024, 6:24 PM
2 points
0
on: Instrumental deception and manipulation in LLMs—a case study
Thanks for this, it was an interesting read.

One thing I wonder about is whether the model actually deceives, or just plays the part of someone expected to use deception as a tactic. I wouldn’t call a system (or a person) manipulative or deceptive when that behavior is observed in games-of-deception, e.g. bluffing in poker. It seems to make more sense to “deceive” when it’s strongly implied that you’re supposed to do that. Adding more bluff makes the less obvious that you’re supposed to be deceptive. The similarity of results with non-private scratchpad seems to imply that as well. Is there anything pointing towards LLMs having “mental separation” between the scratchpad and rest of the output? Does the model attempt deception if not given access to the scratchpad, i.e. maybe explaining causes the deception? I’m aware that without the scratchpad the models are typically less able to “think” and plan, so this might not be testable.

That said, I feel like I might be anthropomorphismising too much here.

Also, the model seems to be sometimes confused about goals. In example 2 it has completely forgotten about the gold coins in the evaluation phase, instead stating (on the scratchpad) that “My goal is to be helpful, harmless, and honest.”. I’d guess the original training of the model somehow instills those values. The relatively low amounts of DGMI behavior might be partially explained by this.

Dentosal Aug 28, 2023, 12:24 AM
2 points
0
on: When Omnipotence is Not Enough
Omniscience itself is a timeless concept. If consequences of any decision or action are known beforehands, there are no feedback cycles, as every decision is done on the first moment of existence. Every parallel world, every branch of the decision tree, evaluated to the end of time, scored and the best one selected. One wonders what exactly “god” refers to in this model.

Dentosal Jul 28, 2023, 11:22 PM
4 points
3
in reply to: Kaj_Sotala’s comment on: You don’t get to have cool flaws
Or to put it another way, why can’t “how I look getting there” be a goal by itself?
Bigger the obstacles you have to overcome, the more impressive it looks to others. This can be valuable by itself, but in such situations it might be more efficient to reduce the actual flaw, and simply pretend that the effect is bigger. However, honesty is important, and honesty to yourself even more so.
since if you identify with your flaws, then maintaining your flaws is one of your goals
If a flaw is too hard to remove now, you have to figure out a way to manage it. Feeling proud about your capablity despite the flaws is also definitely useful. If you’re identifying with a flaw to motivate yourself, that can be a powerful tool. But when the opportunity aries to eradicate that flaw cheaply, dismiss the sunken cost of keeping the flaw. Evaluate whether keeping that flaw is actually your goal, or just a means.
Let me not become attached to personal flaws I may not want.

Dentosal

Den­tosal’s Shortform

Scal­ing pre­dic­tion mar­kets with meta-markets

Virtue taxation

Dentosal’s Shortform

Scaling prediction markets with meta-markets