Mati_Roy

Karma: 2,308

See https://matiroy.com

Mati_Roy May 25, 2024, 10:23 PM
6 points
0
on: Cryonics signup guide #1: Overview
Oregon Brain Preservation is a solid organization offering a free option in the US: https://www.oregoncryo.com/services.html, and Cryonics Germany a free option in Europe: https://cryonics-germany.org/en/

Mati_Roy May 21, 2024, 12:20 AM
2 points
0
in reply to: cubefox’s comment on: Mati_Roy’s Shortform
Thanks for engaging with my post. I keep thinking about that question.

I’m not quite sure what you mean by “values and beliefs are perfectly correlated here”, but I’m guessing you mean they are “entangled”.

there is no test we could perform which would distinguish what it wants from what it believes.

Ah yeah, that seems true for all systems (at least if you can only look at their behaviors and not their mind); ref.: Occam’s razor is insufficient to infer the preferences of irrational agents. Summary: In principle, all possible sets of possible value-system has a belief-system that can lead to any set of actions.

So, in principle, the cat classifier, looked from the outside, could actually be a human mind wanting to live a flourishing human life, but with a decision making process that’s so wrong that the human does nothing but say “cat” when they see a cat, thinking this will lead them to achieve all their deepest desires.

I think the paper says noisy errors would cancel each other (?), but correlated errors wouldn’t go away. One way to solve for them would be coming up with “minimal normative assumptions”.

I guess that’s as much relevant to the “value downloading” as it is to the “value (up)loading” on. (I just coined the term “value downloading” to refer to the problem of determining human values as opposed to the problem of programming values into an AI.)

The solution-space for determining the values of an agent at a high-level seems to be (I’m sure that’s too simplistic, and maybe even a bit confused, but just thinking out loud):
- Look in their brain directly to understand their values (and maybe that also requires solving the symbol-grounding problem)
- Determine their planner (ie. “decision-making process”) (ex.: using some interpretability methods), and determine their values from the policy and the planner
- Make minimal normative assumptions about their reasoning errors and approximations to determine their planner from their behavior (/policy)
- Augment them to make their planners flawless (I think your example fits into improving the planner by improving the image resolution—I love that thought 💡)
- Ask the agent questions directly about their fundamental values which doesn’t require any planning (?)
Approaches like “iterated amplifications” correspond to some combination of the above.

But going back to my original question, I think a similar way to put it is that I wonder how complex the concept of “preferences″/”wanting” is. Is it a (messy) concept that’s highly dependent on our evolutionary history (ie. not what we want, which definitely is, but the concept of wanting itself) or is it a concept that all alien civilizations use in exactly the same way as us? It seems like a fundamental concept, but can we define it in a fully reductionist (and concise) way? What’s the simplest example of something that “wants” things? What’s the simplest planner a wanting-thing can have? Is it no planner at all?

A policy seems well defined–it’s basically an input-output map. We’re intuitively thinking of a policy as a planner + an optimization target, so if either of the latter 2 can be defined robustly, then it seems like we should be able to define the other as well. Although, maybe for a given planner or optimization target there are many possible optimization targets or planners to get a given policy, but maybe Occam’s razor would be helpful here.

Relatedly, I also just read Reward is not the optimization target which is relevant and overlaps a lot with ideas I wanted to write about (ie. neural-net-executor, not reward-maximizers as a reference to Adaptation-Executers, not Fitness-Maximizers). A reward function R will only select a policy π that wants R if wanting R is the best way to achieve R in the environment the policy is being developped. (I’m speaking loosely: technically not if it’s the “best” way, but just if it’s the way the weight-update function works.)

Anyway, that’s a thread that seems valuable to pull more. If you have any other thoughts or pointers, I’d be interested 🙂

Mati_Roy May 9, 2024, 5:11 AM
2 points
0
in reply to: NoriMori1992’s comment on: Let’s split the cake, lengthwise, upwise and slantwise
thanks, it worked! https://web.archive.org/web/20150412211654/http://reducing-suffering.org/wp-content/uploads/2015/02/wild-animals_2015-02-28.pdf

Mati_Roy May 9, 2024, 5:10 AM
3 points
0
on: Mati_Roy’s Shortform
i want a better conceptual understanding of what “fundamental values” means, and how to disentangled that from beliefs (ex.: in an LLM). like, is there a meaningful way we can say that a “cat classifier” is valuing classifying cats even though it sometimes fail?

Mati_Roy May 2, 2024, 9:26 PM
2 points
0
in reply to: Viliam’s comment on: Mati_Roy’s Shortform
when potentially ambiguous, I generally just say something like “I have a different model” or “I have different values”

Mati_Roy Apr 27, 2024, 7:09 PM
17 points
1
on: Mati_Roy’s Shortform
it seems to me that disentangling beliefs and values are important part of being able to understand each other

and using words like “disagree” to mean both “different beliefs” and “different values” is really confusing in that regard

Mati_Roy Apr 18, 2024, 9:56 PM
2 points
0
on: Mati_Roy’s Shortform
topic: economics

idea: when building something with local negative externalities, have some mechanism to measure the externalities in terms of how much the surrounding property valuation changed (or are expected to change based, say, through a prediction market) and have the owner of that new structure pay the owners of the surrounding properties.

Mati_Roy Apr 15, 2024, 10:38 PM
2 points
0
on: Mati_Roy’s Shortform
I wonder what fraction of people identify as “normies”

I wonder if most people have something niche they identify with and label people outside of that niche as “normies”

if so, then a term with a more objective perspective (and maybe better) would be non-<whatever your thing is>

like, athletic people could use “non-athletic” instead of “normies” for that class of people

Mati_Roy Apr 14, 2024, 5:37 PM
15 points
2
on: Mati_Roy’s Shortform
just a loose thought, probably obvious

some tree species self-selected themselves for height (ie. there’s no point in being a tall tree unless taller trees are blocking your sunlight)

humans were not the first species to self-select (for humans, the trait being intelligence) (although humans can now do it intentionally, which is a qualitatively different level of “self-selection”)

on human self-selection: https://www.researchgate.net/publication/309096532_Survival_of_the_Friendliest_Homo_sapiens_Evolved_via_Selection_for_Prosociality

Mati_Roy Apr 14, 2024, 5:14 PM
2 points
0
on: What games are using the concept of a Schelling point?
Board game: Medium

2 players reveal a card with a word, then they need to say a word based on that and get points if it’s the same word (basically, with some more complexities).

Example at 1m20 here: https://youtu.be/yTCUIFCXRtw?si=fLvbeGiKwnaXecaX

Mati_Roy Apr 14, 2024, 3:15 AM
2 points
0
in reply to: Mati_Roy’s comment on: What games are using the concept of a Schelling point?
I’m glad past Mati cast a wider net has the specifics for this year’s Schelling day are different ☺️☺️

Mati_Roy Feb 3, 2024, 11:09 PM
2 points
0
on: San Francisco ACX Meetup “First Saturday”
idk if the events are often going over time, but I might pass by now if it’s still happening ☺️

Mati_Roy Jan 1, 2024, 8:58 PM
2 points
0
on: When scientists consider whether their research will end the world
I liked reading your article; very interesting! 🙏

One point I figured I should x-post with our DMs 😊 --> IMO, if one cares about future lives (as much as present ones) then the question stops really being about expected lives and starts just being about whether an action increases or decreases x-risks. I think a lot/all of the tech you described also have a probability of causing an x-risk if they’re not implemented. I don’t think we can really determine whether a probability of some of those x-risk is low enough in absolute terms as those probabilities would need to be unreasonably low, leading to full paralysis, and full paralysis could lead to x-risk. I think instead someone with those values (ie. caring about unborn people) should compare the probability of x-risks if a tech gets developed vs not developed (or whatever else is being evaluated). 🙂

Mati_Roy Jan 1, 2024, 4:19 PM
2 points
on: Cryonics signup guide #1: Overview
new, great, complementary post: Critical Questions about Patient Care in Cryonics and Biostasis

Mati_Roy Dec 13, 2023, 5:46 AM
2 points
1
on: Without—MicroFiction 250 words
I love this story so much, wow! It feels so incredibly tailored to me (because it is 😄). I value that a lot! It’s a very scarce resource to begin with, but it hardly gets more tailored than that 😄

Mati_Roy Oct 24, 2023, 3:12 AM
2 points
0
in reply to: nick lacombe’s comment on: Montreal cryonics and longevity hangout 2023-10-01
that’s awesome; thanks for letting me know :)

[Question] Which LessWrongers are (aspiring) YouTubers?

Mati_RoyOct 23, 2023, 1:21 PM

22 points

13 comments1 min readLW link

Mati_Roy Oct 23, 2023, 12:29 PM
3 points
0
on: Montreal cryonics and longevity hangout 2023-10-01
i’d be curious to know how the first event went if you’re inclined to share ☺

Private Biostasis & Cryonics Social

Mati_RoyAug 16, 2023, 2:34 AM

11 points

0 comments1 min readLW link

Mati_Roy Aug 14, 2023, 6:39 PM
6 points
on: Mati_Roy’s Shortform
cars won’t replace horses, horses with cars will

Mati_Roy

[Question] Which LessWrongers are (as­piring) YouTu­bers?

Pri­vate Biosta­sis & Cry­on­ics Social

[Question] Which LessWrongers are (aspiring) YouTubers?

Private Biostasis & Cryonics Social