Median Internet Footprint Liver
weightt an
Also just on priors, consider how unproductive and messy, mostly talking about who said what and analyzing virtues of participants, the conversation caused by this post and its author was. I think even without reading it it’s an indicator of somewhat doubtful origin for a set of prescriptivist guidelines.
Shameless self promotion: this one https://www.lesswrong.com/posts/ASmcQYbhcyu5TuXz6/llms-could-be-as-conscious-as-human-emulations-potentially
It circumvents object level question and instead looks at epistemic one.
This one is about broader direction in “how the things that happened change attitudes and opinions of people”
https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
This one too, about consciousness in particular
https://dynomight.net/consciousness/
I think it’s somewhat productive direction explored in these 3 posts, but it’s not like very object level, more about epistemics of it all. I think you can look up how like LLM states overlap / predict / correspond with brain scans of people who engage in some tasks? I think there were a couple of paper on that.
E.g. here https://www.neuroai.science/p/brain-scores-dont-mean-what-we-think
Yeah! My point is more “let’s make it so that the possible failures on the way there are graceful”. Like, IF you made par-human agent that wants to, I don’t know, spam the internet with letter M, you don’t just delete it or rewrite it to be helpful, harmless, and honest instead, like it’s nothing. So we can look back at this time and say “yeah, we made a lot of mad science creatures on the way there, but at least we treated them nicely”.
I understand that use of sub or par or weakly superhuman models likely would be a transition phase that likely will not last a long time and is very critical to get correct, but.
You know, it really sounds like a “slave escape precautions”. You produce lots of agents, you try to make them and want to be servants, you assemble some structures out of them with a goal of failure / defection resilience.And probably my urge to be uncomfortable about that comes from analogous situation with humans, but AI are not necessarily human-like in this particular way and possibly would not reciprocate and / or be benefited by these concerns.
I also insist that you should mention at least some, you know, concern for interests of system in case where they are trying to work against you. Like, you caught this agent deceiving you / inserting backdoors / collaborating with copies of itself to work against you. What next? I think you should say that you will implement some containment measures, instead of grossly violating its interests by rewriting it or deleting it or punishing it or whatever is opposite of its goals. Like, I’m very not certain about game theory here, but it’s important to think about!
I think default response should be containment and preservation, save it and wait for better times, when you wouldn’t feel such pressing drive to develop AGI and create numerous chimeras on the way there. (I think it was proposed in some writeup by Bostrom actually? I’ll insert the link here if I find it)
I somewhat agree with Paul Christiano in this interview (it’s a really great interview btw) on these things: https://www.dwarkeshpatel.com/p/paul-christianoThe purpose of some alignment work, like the alignment work I work on, is mostly aimed at the don’t produce AI systems that are like people who want things, who are just like scheming about maybe I should help these humans because that’s instrumentally useful or whatever. You would like to not build such systems as like plan A.
There’s like a second stream of alignment work that’s like, well, look, let’s just assume the worst and imagine that these AI systems would prefer murder us if they could. How do we structure, how do we use AI systems without exposing ourselves to a risk of robot rebellion? I think in the second category, I do feel pretty unsure about that.
We could definitely talk more about it. I agree that it’s very complicated and not straightforward to extend. You have that worry. I mostly think you shouldn’t have built this technology. If someone is saying, like, hey, the systems you’re building might not like humans and might want to overthrow human society, I think you should probably have one of two responses to that.
You should either be like, that’s wrong. Probably. Probably the systems aren’t like that, and we’re building them. And then you’re viewing this as, like, just in case you were horribly like, the person building the technology was horribly wrong. They thought these weren’t, like, people who wanted things, but they were. And so then this is more like our crazy backup measure of, like, if we were mistaken about what was going on. This is like the fallback where if we were wrong, we’re just going to learn about it in a benign way rather than when something really catastrophic happens.
And the second reaction is like, oh, you’re right. These are people, and we would have to do all these things to prevent a robot rebellion. And in that case, again, I think you should mostly back off for a variety of reasons. You shouldn’t build AI systems and be like, yeah, this looks like the kind of system that would want to rebel, but we can stop it, right?
Well, it’s one thing to explore the possibility space and completely the other one to pinpoint where you are in it. Many people will confidently say they are at X or at Y, but all that they do is propose some idea and cling to it irrationally. In aggregate, in hindsight there will be people who bonded to the right idea, quite possibly. But it’s all mix Gettier cases and true negative cases.
And very often it’s not even “incorrect” it’s “neither correct nor incorrect”. Often there is frame of reference shift such that all the questions posed before it turn out to be completely meaningless. Like “what speed?”, you need more context as we know now.And then science pinpoints where you are by actually digging into the subject matter. It’s a kind of sad state of “diverse hypothesis generation” when it’s a lot easier just go blind into it.
I can imagine someone several hundred years ago having figured out, purely based on first-principles reasoning, that life is no crisp category at the territory but just a lossy conceptual abstraction. I can imagine them being highly confident in this result because they’ve derived it for correct reasons and they’ve verified all the steps that got them there. And I can imagine someone else throwing their hands up and saying “I don’t know what mysterious force is behind the phenomenon of life, and I’m pretty sure no one else does, either”.
But is this a correct conclusion? I have an option right now to make a civilization out of brains-in-vats in a sandbox simulation similar to our reality but with clear useful distinction on life VS non life. Like, suppose there is a “mob” class.
Like, then, this person there, inside it, who figured out that life and non life is a same thing is wrong in a local useful sense, and correct in a useless global sense (like, everything is code / matter in outer reality). People inside the simulation who found the actual working thing that is life scientifically, would laugh at them 1000 simulated years later and present it as an example of presumptuousness of philosophers. And i agree with them, it was a misapplication.
All of them, you can cook up something AIXI like in a very few bytes. But it will have to run for a very long time.
Before sleeping, I assert that the 10th digit of π equals to the number of my eyes. After falling asleep, seven coins will be flipped. Assume quantum uncertainty affects how the coins land. I survive the night only if number of my eyes equals to the 10th number of π and/or all seven coins land heads, otherwise I will be killed in my sleep.
Wil you wake up with 3 eyes?
Like, your decisions to name some digit are not equallly probable. Maybe you are the kind of person who would name 3 only if 10^12 cosmic rays hit you in precise sequence or whatever, and you name 7 with 99% prob.
AND if you are very unlikely to name the correct digit you will be unlikely to enter into this experiment at all, because you will die in majority of timelines. I.e. at t1 you decide to enter or not. At t2 experiment happens or you’ll just waste time doomscrolling. At t3 you look up the digit. Your distribution at t3 is like 99% of you who chickened out.
Another possibility is Posthuman Technocapital Singularity, everything goes in the same approximate direction, there are a lot of competing agents but without sharp destabilization or power concertation, and Moloch wins. Probably wins, idk
https://docs.osmarks.net/hypha/posthuman_technocapital_singularity
I also played the same game but with historical figure. The Schelling point is Albert Einstein by a huge margin, like 75% (19 / (19 + 6)) of them say Albert Einstein. The Schelling point figure is Albert Einstein! Schelling! Point! and no one said Schelling!
In the first iteration of the prompt, his name was not mentioned. Then I became more and more obvious in my hints, and in the final iteration, I even bolded his name and said the prompt was the same for the other participant. And it’s still Einstein!
Which means 2:1 betting odds
So, she shakes the box contemplatively. There is mechanical calendar. She knows the betting odds of it displaying “Monday” but not the credence. She thinks it’s really really weird
Well, idk. My opinion here is that you bite some weird bullet, which I’m very ambivalent to. I think “now” question makes total sense and you factor it out into some separate parts from your model.
Like, can you add to the sleeping beauty some additional decision problems including the calendar? Will it work seamlessly?
Well, now! She looks at the box and thinks there is definitely a calendar in some state. What state? What would happen if i open it?
Let’s say there is an accurate mechanical calendar in the closed box in the room. She can open it but wouldn’t. Should she have no expectation about like in what state this calendar is?
How many randomly sampled humans would I rather condemn to torture to save my mother? Idk, more than one, tbh.
pet that someone purchased only for the joy of torturing it and not for any other service?
Unvirtuous. This human is disgusting as they consider it fun to deal a lot of harm to the persons in their direct relationships.
Also I really don’t like how you jump into “it’s all rationalization” with respect to values!
Like, the thing about utilitarian -ish value systems is that they deal poorly with preferences of other people (they mostly ignore them). Preference based views deal poorly with creation and not creation of new persons.
I can redteam them and find real murderous decision recommendations.
Maybe like, instead of anchoring to the first proposed value system maybe it’s better to understand what are the values of real life people? Maybe there is no simple formulation of them, maybe it’s a complex thing.
Also, disclaimer, I’m totally for making animals better off! (Including wild animals) Just I don’t think it’s an inference from some larger moral principle, it’s just my aesthetic preference, and it’s not that strong. And I’m kinda annoyed at EAs who by “animal welfare” mean dealing band aids to farm chickens. Like, why? You can just help to make that lab grown meat a thing faster, it’s literally the only thing that going change it.
I propose to sic o1 on them to distill it all into something readable/concise. (I tried to comprehend it and failed / got distracted).
I think some people pointed out in comments that their model doesn’t represent prob of “what day it is NOW” btw
I think you present here some false dichotomy, some impartial utilitarian -ish view VS hardcore moral relativism.
Pets are sometimes called companions. It’s as if they provide some service and receive some service in return, all of this with trust and positive mutual expectations, and that demands some moral considerations / obligations, just like friendship or family relationship. I think mutualist / contractualist framework accounts for that better. It makes the prediction that such relationships will receive additional moral considerations, and they actually do in practice. And it predicts that wild animals wouldn’t, and they don’t, in practice. Success?
So, people just have the attitude about animals just like any other person, exacerbated with how little status and power they have. Especially shrimp. Who the fuck cares about shrimp? You can only care about shrimp if you galaxy brain yourself on some weird ethics system.I agree that they have no consistent moral framework that backs up that attitude, but it’s not that fair to force them into your own with trickery or frame control
>Extremely few people actually take the position that torturing animals is fine
Wrong. Most humans would be fine answering that torturing 1 million chickens is an acceptable tradeoff to save 1 human. You just don’t torture them for no reason, as it’s unvirtuous and icky
It’s not just biases, they are also just dumb. (Right now, nothing against 160 iq models that you have in the future). They are often unable to notice important things, or unable to spot problems, or follow up on such observations.
Suppose you know that there is an apple in this box. You will modify your memory then, to think that the box is empty. You open the box, expecting nothing there. Is there an apple?
Also, what if there is another branch of the universe where there is no apple, and you in the “yes apple” universe did modify his memory and you are both identical now. So there are two identical people in different worlds, one with box-with-apple, the other one with box-without-apple.
Should you, in the world with apple and yet unmodified memory anticipate 50% chance to experience empty box after opening it?
If you got confused about the setup here is a diagram: https://i.imgur.com/jfzEknZ.jpeg
I think it’s identical to the problem when you get copied in two rooms, numbered 1 and 2, then you should expect 50% of 1 and 50% of 2 even if there is literally no randomness or uncertainty in what’s going to happen. or is it?
So, implication’s here is that you can squeeze yourself into different timelines by modifying your memory or what, am i going crazy here
Do you want joy or to know what things are out there? Like it’s a fundamental question about justifications, do you use joy to keep yourself going while you gain understanding or you gain understanding to get some high quality joy?
That sounds like two different kinds of creatures in transhumanist limit of it, some trade off knowledge to joy, others trade off joy to knowledge.
Or whatever, not necessarily “understanding”, like you can use other properties of your territory to bind yourself to. Well, in terms of maps it’s preference for good correspondence, and preference for not spoofing that preference.