Pattern

Karma: 2,107

Interested in math, Game Theory, etc.

Pattern Jun 30, 2022, 7:23 PM
2 points
0
in reply to: Ben Pace’s comment on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
It didn’t state that explicitly re sorting, but looking at:
It has no other direct effects on the site or content visibility.
I see what you mean. (This would have been less of a question in a ‘magic-less sorting system’.)

Pattern Jun 30, 2022, 7:19 PM
2 points
0
in reply to: habryka’s comment on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
Agree/Disagree are weird when evaluating your comment.
Agree with you asking the question (it’s the right question to ask) or disagree with your view?
I read Duncan’s comment as requesting that the labeling of the buttons be more explicit in some way, though I wasn’t sure if it was your way. (Also Duncan disagreeing with what they reflect).

Pattern Jun 30, 2022, 7:17 PM
4 points
0
in reply to: Duncan Sabien (Deactivated)’s comment on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
Upvote (Like**)
- Quality*
Agreement (Truth)
- Veracity
Not present***: Value?Judgement? (Good/bad)
- Good/Bad
**This is in ()s because it’s the word that shows up in bold when hovering over a button.
*How well something is written?
***That is a harsher bold than I was going for.

Pattern Jun 30, 2022, 7:11 PM
3 points
0
in reply to: Emrik’s comment on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
I think some aspects of ‘voting’ might benefit from being public. ‘Novelty’ is one of them. (My first thought when you said ‘can’t be downvoted’ was ‘why?’. My filtering desires for this might be...complex. The simple feature being:
I want to be able to sort by novelty. (But also be able to toggle ‘remove things I’ve read from the list’. A toggle, because I might want it to be convenient to revisit (some) ‘novel’ ideas.))

Pattern Jun 30, 2022, 7:06 PM
2 points
1
in reply to: tutor vals’s comment on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
you should also have to be thinking about
Consider replacing this long phrase (above) with ‘consider’.

Pattern Jun 30, 2022, 7:04 PM
2 points
0
in reply to: Rafael Harth’s comment on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
Upvoting/downvoting self
- Sorting importance
‘Agreeing’/‘Disagreeing’
- ‘I have discovered that this (post (of mine)) is wrong in important ways’
- or
- Looking back, this has still stood the test of time.
These methods aren’t necessarily very effective (here).
Arguably, this can be done better by:
Having them be public (likely in text). What you think of your work is also important. (‘This is wrong. I’m leaving it up, but also see this post explaining where I went wrong, etc.’)
See the top of this article for an example: https://www.gwern.net/Fake-Journal-Club
⁠certainty: log ⁠importance: 4

Pattern Jun 30, 2022, 6:57 PM
2 points
0
on: LessWrong Has Agree/Disagree Voting On All New Comment Threads
How do sorting algorithms (for comments) work now?

Pattern Jun 22, 2022, 2:25 PM
3 points
on: Steam
For companies, this is something like the R&D budget. I have heard that construction companies have very little or no R&D. This suggests that construction is a “background assumption” of our society.

Or that research is happening elsewhere. Our society might not give it as much focus as it could though.

Pattern Jun 22, 2022, 2:24 PM
3 points
on: Steam
In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. “Full steam” is classically rational, but we do not always want that. We might even conjecture that we never want that.
So you never do anything with your full strength, because getting results is bad?
Well, by ‘we’ you mean both ‘you’ and ‘a thing you are designing with quantilization’.

Pattern Jun 21, 2022, 10:20 PM
2 points
in reply to: Jacob Pfau’s comment on: Instrumental Convergence For Realistic Agent Objectives
It seems to me that in a competitive, 2-player, minimize-resource-competition StarCraft, you would want to go kill your opponent so that they could no longer interfere with your resource loss?
I would say that in general it’s more about what your opponent is doing. If you are trying to lose resources and the other player is trying to lose them, you’re going to get along fine. (This would be likely be very stable and common if players can kill units and scavenge them for parts.) If both of you are trying to lose them...
Trying to minimize resources is a weird objective for StarCraft. As is gain resources. Normally it’s a means to an end—destroying the other player first. Now, if both sides start out with a lot of resources and the goal is to hit zero first...how do you interfere with resource loss? If you destroy the other player don’t their resources go to zero? Easy to construct, by far, is ‘losing StarCraft’. And I’m not sure how you’d force a win.
This starts to get into ‘is this true for Minecraft’ and...it doesn’t seem like there’s conflict of the ‘what if they destroy me, so I should destroy them from’ kind, so much as ‘hey stop stealing my stuff!’. Also, death isn’t permanent, so… There’s not a lot of non-lethal options. If a world is finite (and there’s enough time) eventually, yeah, there could be conflict.
More generally, I think competitions to minimize resources might still usually involve some sort of power-seeking.
In the real world maybe I’d be concerned with self nuking. Also starting a fight, and stuff like that—to ensure destruction—could work very well.

Pattern Jun 21, 2022, 9:13 PM
2 points
in reply to: Dagon’s comment on: Dagger of Detect Evil
stab
You assume they’re dead. (It gives you a past measurement—no guarantee someone won’t become evil later.)

Pattern Jun 21, 2022, 9:12 PM
3 points
in reply to: Aleksi Liimatainen’s comment on: Dagger of Detect Evil
Okay, but no testing it on yourself, or anyone else you don’t want dead. You’d be lucky to lose only a finger, or a hand.

Pattern Jun 20, 2022, 8:40 PM
17 points
15
in reply to: Liam Donovan’s comment on: Let’s See You Write That Corrigibility Tag
It’s a shame we can’t see the disagree number and the agree number, instead of their sum.

Pattern Jun 20, 2022, 8:37 PM
2 points
0
on: Let’s See You Write That Corrigibility Tag
So far as I know, every principle of this kind, except for Jessica Taylor’s “quantilization”, and “myopia” (not sure who correctly named this as a corrigibility principle), was invented by myself; eg “low impact”, “shutdownability”. (Though I don’t particularly think it hopeful if you claim that somebody else has publication priority on “low impact” or whatevs, in some stretched or even nonstretched way; ideas on the level of “low impact” have always seemed cheap to me to propose, harder to solve before the world ends.)
Low impact seems so easy to propose I doubt OP is the first.
I believe paulfchristiano has already raised this point, but at what level of ‘principles’ are being called for?
Myopia seems meant as a means to achieve shutdownability/modifiability.
Likewise for quanilization/TurnTrout’s work, on how to achieve low impact.

Pattern Jun 20, 2022, 8:34 PM
3 points
1
in reply to: Jan_Kulveit’s comment on: Let’s See You Write That Corrigibility Tag
3. AI which ultimately wants to not exist in future as a terminal goal. Fulfilling the task is on the simplest trajectory to non-existence

The first part of that sounds like it might self destruct. And if it doesn’t care about anything else...that could go badly. Maybe nuclear badly depending… The second part makes it make more sense though.
9. Ontological uncertainty about level of simulation.
So it stops being trustworthy if it figures out it’s not in a simulation? Or, it is being simulated?

Pattern Jun 20, 2022, 8:31 PM
4 points
0
in reply to: Rob Bensinger’s comment on: Let’s See You Write That Corrigibility Tag
Also:
Being able to change a system after you’ve built it.
(This also refers to something else—being able to change the code. Like, is it hard to understand? Are there modules? etc.)

Pattern Jun 20, 2022, 8:24 PM
3 points
0
in reply to: lc’s comment on: Let’s See You Write That Corrigibility Tag
I think those are just two principles, not just four.
Myopia seems like it includes/leads to ‘shutdownability’, and some other things.
Low impact: How low? Quantilization is meant as a form of adjustable impact. There’s been other work* around this (formalizing power/affecting other’s ability to achieve their goals).
*Like this, by TurnTrout: https://www.lesswrong.com/posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure
I think there might be more from TurnTrout, or relating to that. (Like stuff that was intended to explain it ‘better’ or as the ideas changed as people worked on them more.)

Pattern 20 Jun 2022 20:22 UTC
4 points
0
in reply to: Tor Barstad’s comment on: Let’s See You Write That Corrigibility Tag
I would set up a “council” of AGI-systems (a system of systems), and when giving it requests in an oracle/genie-like manner I would see if the answers converged. At first it would be the initial AGI-system, but I would use that system to generate new systems to the “council”.

I like this idea. Although, if things don’t converge, i.e. there is disagreement, this could potentially serve as identifying information that is needed to proceed, or reckon further/efficiently.

Pattern 20 Jun 2022 20:19 UTC
0 points
0
in reply to: Samuel Clamons’s comment on: Let’s See You Write That Corrigibility Tag
Votes aren’t public. (Feedback can be.)

Pattern 20 Jun 2022 20:18 UTC
2 points
0
in reply to: Vermillion’s comment on: Let’s See You Write That Corrigibility Tag
-Tell operators anything about yourself they may want to or should know.
...
but of course explain what you think the result will be to them
Possible issue: They won’t have time to listen. This will limit the ability to:
defer to human operators.
Also, does defer to human operators take priority over ‘humans must understand consequences’?