Benjy_Forstadt

Karma: 49

Benjy_Forstadt Jan 24, 2025, 5:41 AM
2 points
−6
on: Do you consider perfect surveillance inevitable?
I don’t think perfect surveillance is inevitable.

I would prefer it, though. I don’t know any other way to prevent people from doing horrible things to minds running on their computers. It wouldn’t need to be publicly broadcast though, just overseen by law enforcement. I think this is much more likely than a scenario where everything you see is shared with everyone else.

Unfortunately, my mainline prediction is that people will actually be given very strong privacy rights, and will be allowed to inflict as much torture on digital minds under their control as they want. I’m not too confident in this though.

Benjy_Forstadt Sep 23, 2024, 3:15 PM
1 point
0
in reply to: Benjy_Forstadt’s comment on: Another argument against utility-centric alignment paradigms
Basically people tend to value stuff they perceive in the biophysical environment and stuff they learn about through the social environment.

So that reduces the complexity of the problem—it’s not a matter of designing a learning algorithm that both derives and comes to value human abstractions from observations of gas particles or whatever. That’s not what humans do either.

Okay then, why aren’t we star-maximizers or number-of-nation-states maximizers? Obviously it’s not just a matter of learning about the concept. The details of how we get values hooked up to an AGI’s motivations will depend on the particular AGI design but probably reward, prompting, scaffolding or the like.

Benjy_Forstadt Sep 23, 2024, 6:40 AM
1 point
0
in reply to: Thane Ruthenis’s comment on: Another argument against utility-centric alignment paradigms
I don’t think the way you split things up into Alpha and Beta quite carves things at the joints. If you take an individual human as Beta, then stuff like “eudaimonia” is in Alpha—it’s a concept in the cultural environment that we get exposed to and sometimes come to value. The vast majority of an individual human’s values are not new abstractions that we develop over the course of our training process (for most people at least).

Benjy_Forstadt Sep 22, 2024, 11:21 PM
13 points
4
in reply to: Thane Ruthenis’s comment on: Another argument against utility-centric alignment paradigms
There is a difference between the claim that powerful agents are approximately well-described as being expected utility maximizers (which may or may not be true) and the claim that AGI systems will have an explicit utility function the moment they’re turned on, and maximize that function from that moment on.

I think this is the assumption OP is pointing out: “most of the book’s discussion of AI risk frames the AI as having a certain set of goals from the moment it’s turned on, and ruthlessly pursuing those to the best of its ability”. “From the moment it’s turned on” is pretty important, because it rules out value learning as a solution

Benjy_Forstadt Jun 6, 2024, 2:30 AM
1 point
0
in reply to: Benjy_Forstadt’s comment on: MIRI 2024 Communications Strategy
Edit: Retracted because some of my exegesis of the historical seed AI concept may not be accurate

Benjy_Forstadt Jun 2, 2024, 2:31 PM
1 point
0
in reply to: quetzal_rainbow’s comment on: MIRI 2024 Communications Strategy
There will be future superintelligent AIs that improve themselves. But they will be neural networks, they will at the very least start out as a compute-intensive project, in the infant stages of their self-improvement cycles they will understand and be motivated by human concepts rather than being dumb specialized systems that are only good for bootstrapping themselves to superintelligence.

Benjy_Forstadt Jun 1, 2024, 9:53 PM
1 point
2
in reply to: quetzal_rainbow’s comment on: MIRI 2024 Communications Strategy
How does the question of whether AI outcomes are more predictable than AI trajectories reduce to the (vague) question of whether observations on current AIs generalize to future AIs?

Benjy_Forstadt Jun 1, 2024, 3:27 PM
37 points
3
in reply to: quetzal_rainbow’s comment on: MIRI 2024 Communications Strategy
To be blunt, it’s not just that Eliezer lacks a positive track record in predicting the nature of AI progress, which might be forgivable if we thought he had really good intuitions about this domain. Empiricism isn’t everything, theoretical arguments are important too and shouldn’t be dismissed. But-

Eliezer thought AGI would be developed from a recursively self-improving seed AI coded up by a small group, “brain in a box in a basement” style. He dismissed and mocked connectionist approaches to building AI. His writings repeatedly downplayed the importance of compute, and he has straw-manned writers like Moravec who did a better job at predicting when AGI would be developed than he did.

Old MIRI intuition pumps about why alignment should be difficult like the “Outcome Pump” and “Sorcerer’s apprentice” are now forgotten, it was a surprise that it would be easy to create helpful genies like LLMs who basically just do what we want. Remaining arguments for the difficulty of alignment are esoteric considerations about inductive biases, counting arguments, etc. So yes, let’s actually look at these arguments and not just dismiss them, but let’s not pretend that MIRI has a good track record.

Benjy_Forstadt Nov 16, 2016, 11:01 PM
1 point
0
on: Value
Due partly to the choice of using ‘value’ as a speaker dependent variable, some of the terminology used in this article doesn’t align with how the terms are used by professional metaethicists. I would strongly suggest one of:

1) replacing the phrase “moral internalism” with a new phrase that better individuates the concept.

2) including a note that the phrase is being used extremely non-standardly.

3) adding a section explaining the layout of metaethical possibilities, using moral internalism in the sense intended by professional metaethicists.

In metaethics, moral internalism, roughly, is the disjunction:

‘Value’ is speaker independent and universally compelling OR ‘Value’ is speaker dependent and is only used to indicate properties the speaker finds compelling

This seems very un-joint-carvy from a perspective of value allignment, but most philosophers see internalism as a semantic thesis that captures the relation between moral judgements and motivation. The idea is: If someone says something has value, she values that thing. This is very very different from how the term is used in this article.

I can provide numerous sources to back this up, if needed.

Benjy_Forstadt Jun 3, 2016, 7:39 AM
1 point
0
on: Executable philosophy
I have a few complaints/questions:

1) “What is goodness made out of” is not really a particularly active discussion in professional philosophy. I feel that this was put in there just to make analytic philosophers look silly. And anyways, if one believes in naturalistic moral properties (the stuff that we value,) then “what is goodness made out of” really is the question “what is good,” which I think is probably a fine question. In this case, rephrasing in terms of AI just makes philosophical discussions more wordy and less accessible.

2) “Faced with any philosophically confusing issue, our task is to identify what cognitive algorithm humans are executing which feels from the inside like this sort of confusion, rather than, as in conventional philosophy, to try to clearly define terms and then weigh up all possible arguments for all ‘positions’.”

I don’t get what the problem is with clearly defining terms and weighing up pros and cons for positions. Is conceptual analysis (http://philpapers.org/browse/conceptual-analysis) so problematic that it has no place in an improved version of philosophy? I think that there are at least a few parallels between that project in philosophy and the sentiment expressed in https://arbital.com/p/3y6/, for example.

3) “Most “philosophical issues” worth pursuing can and should be rephrased as subquestions of some primary question about how to design an Artificial Intelligence, even as a matter of philosophy qua philosophy.”

What is “philosophy qua philosophy?”

“This imports the discipline of programming into philosophy. In particular, programmers learn that even if they have an inchoate sense of what a computer should do, when they actually try to write it out as code, they sometimes find that the code they have written fails (on visual inspection) to match up with their inchoate sense. Many ideas that sound sensible as English sentences are revealed as confused as soon as we try to write them out as code.”

How would one translate questions like “Are there unverifiable truths?” or “under what conditions does the parthood relation hold?” into AI-speak?

Benjy_Forstadt Jun 3, 2016, 6:26 AM
1 point
0
on: Orthogonality Thesis
The section on Moral Internalism is slightly inaccurate, or at least misleading. Internalism is the metaethical view that an agent can not judge something to be right and yet still not be the least bit motivated to perform it. As such, it is really a semantic claim about the meaning of moral vocabulary: whether or not it is part of the meaning of “that is right” or “that is wrong” that the speaker approves or disapproves respectively of an action. Internalism, then, (as intended by analytic philosophers,) is totally compatible with the Orthogonality Thesis. (Internalism + Orthogonality = noncognitivism or relativism or nihilism.) IIRC, Hume himself was an Internalist!

Sources: http://plato.stanford.edu/entries/moral-motivation/

I suggest changing the section to either “Realist Moral Internalism” or a more comprehensive examination of the options available to the AI-grade philosopher when it comes to moral motivation.