Was this not originally tagged “personal blog”?
I’m not sure what the consensus is on how to vote on these posts, but I’m sad that this post’s poor reception might be why its author deactivated their account.
Was this not originally tagged “personal blog”?
I’m not sure what the consensus is on how to vote on these posts, but I’m sad that this post’s poor reception might be why its author deactivated their account.
I just reported this to Feedly.
Thanks for the info! And no worries about the (very) late response – I like that people fairly often reply at all (beyond same-day or within a few days) on this site; makes the discussions feel more ‘timeless’ to me.
The second “question” wasn’t a question, but it was due to not knowing that Conservative Judaism is distinct from Orthodox Judaism. (Sadly, capitalization is only relatively weak evidence of ‘proper-nounitude’.)
Some of my own intuitions about this:
Yes, this would be ‘probabilistic’ and thus this is an issue of evidence that AIs would share with each other.
Why or how would one system trust another that the state (code+data) shared is honest?
Sandboxing is (currently) imperfect, tho perhaps sufficiently advanced AIs could actually achieve it? (On the other hand, there are security vulnerabilities that exploit the ‘computational substrate’, e.g. Spectre, so I would guess that would remain as a potential vulnerability even for AIs that designed and built their own substrates.) This also seems like it would only help if the sandboxed version could be ‘sped up’ and if the AI running the sandboxed AI can ‘convince’ the sandboxed AI that it’s not’ sandboxed.
The ‘prototypical’ AI I’m imagining seems like it would be too ‘big’ and too ‘diffuse’ (e.g. distributed) for it to be able to share (all of) itself with another AI. Another commenter mentioned an AI ‘folding itself up’ for sharing, but I can’t understand concretely how that would help (or how it would work either).
I think my question is different, tho that does seem like a promising avenue to investigate – thanks!
That’s an interesting idea!
An oscilloscope
I guessed that’s what you meant but was curious whether I was right!
If the AI isn’t willing or able to fold itself up into something that can be run entirely on single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.
Any idea how a ‘folded-up’ AI would imply anything in particular about the ‘expanded’ AI?
If an AI ‘folded itself up’ and provably/probably ‘deleted’ its ‘expanded’ form (and all instances of that), as well as any other AIs or not-AI-agents under its control, that does seem like it would be nearly “alignment-complete” (especially relative to our current AIs), even if, e.g. the AI expected to be able to escape that ‘confinement’.
But that doesn’t seem like it would work as a general procedure for AIs cooperating or even negotiating with each other.
What source code and what machine code is actually being executed on some particular substrate is an empirical fact about the world, so in general, an AI (or a human) might learn it the way we learn any other fact—by making inferences from observations of the world.
This is a good point.
But I’m trying to develop some detailed intuitions about how this would or could work, in particular what practical difficulties there are and how they could be overcome.
For example, maybe you hook up a debugger or a waveform reader to the AI’s CPU to get a memory dump, reverse engineer the running code from the memory dump, and then prove some properties you care about follow inevitably from running the code you reverse engineered.
In general though, this is a pretty hard, unsolved problem—you probably run into a bunch of issues related to embedded agency pretty quickly.
(What do you mean by “waveform” reader”?)
Some practical difficulties with your first paragraph:
How can AI’s credibly claim that any particular CPU is running their code, or that a debugger connected to it isn’t being subverted via, e.g. MITM?
How can AI’s credibly claim that whatever the contents of a ‘CPU’s’ memory is at some point, it won’t be replaced at some later point?
How could one AI safely execute code given to it by another (e.g. via “memory dump”)?
How could one AI feasibly run another’s code ‘fast enough’ to be able to determine that it could (probably) trust it now (even assuming [1], [2], and [3] are solved)?
[1] points to what I see as a big difficulty, i.e. AIs will probably (or could) be very distributed computing systems and there might not be any practical way to ‘fit into a single box’ for, e.g. careful inspection by others.
This is a nice summary!
fictional role-playing server
As opposed to all of the non-fictional role playing servers (e.g. this one)?
I don’t think most/many (or maybe any) of the stories/posts/threads on the Glowfic site are ‘RPG stories’, let alone some kind of ‘play by forum post’ histories, there’s just a few that use the same settings as RPGs.
I suspect a lot of people, like myself, learn “content-based writing” by trying to communicate, e.g. in their ‘personal life’ or at work. I don’t think I learned anything significant by writing in my own “higher forms of [‘official’] education”.
I would still like to see political pressure for truly open independent audits, though.
I think that would be a big improvement. I also think ARC is, at least effectively, working on that or towards it.
Damning allegations; but I expect this forum to respond with minimization and denial.
This is so spectacularly bad faith that it makes me think the reason you posted this is pretty purely malicious.
Out of all of the LessWrong and ‘rationalist’ “communities” that have existed, how many are ones for which any of the alleged bad acts occurred? One? Two?
Out of all of the LessWrong users and ‘rationalists’, how many have been accused of these alleged bad acts? Mostly one or two?
My having observed extremely similar dynamics about, e.g. sexual harassment, in several different online and in-person ‘communities’, the ‘communities’ of or affiliated with ‘rationality’, LessWrong, and EA have been, far and away, the most diligent about actually effectively mitigating, preventing, and (reasonably) punishing bad behavior.
It is really unclear what standards the ‘communities’ are failing to meet and that makes me very suspicious that those standards are unreasonable.
Please don’t pin the actions of others on me!
No, it’s not, especially given that ‘whataboutism’ is a label used to dismiss comparisons that don’t advance particular arguments.
Writing the words “what about” does not invalidate any and all comparisons.
I think some empathy and sympathy is warranted to the users of the site that had nothing to do with any of the alleged harms!
It is pretty tiresome to be accused-by-association. I’m not aware of any significant problems with abuse “in LessWrong”. And, from what I can tell, almost all of the alleged abuse happened in one particular ‘rationalist community’, not all, most, or even many of them.
I’m extremely skeptical that the article or this post were inspired by compassion towards anyone.
I think the quoted text is inflammatory and “this forum” (this site) isn’t the same as wherever the alleged bad behavior took place.
Is contradicting something you believe to be, essentially, false equivalent to “denial”?
It is anomalous that people are quite uninterested in optimizing this as it seems clearly important.
I have the opposite sense. Many people seem very interested in this.
“This community” is a nebulous thing and this site is very different than any of the ‘in-person communities’.
But I don’t think there’s strong evidence that the ‘communities’ don’t already “have much lower than average levels of abuse”. I have an impression that, among the very-interested-in-this people, any abuse is too much.
What kind of more severe punishment should “the rationalist community” mete out to X and how exactly would/should that work?
I admit now that I was in fact missing the point.
I can (maybe/kinda) imagine someone else doing something like this and not definitely thinking it was wholly unjustified, but I agree now that this a damning part of a larger damning (and long enduring) pattern of bad behavior on Wolfram’s part.
You were right. I was wrong.