Peter Hroššo

Karma: 68

Peter Hroššo Aug 21, 2022, 12:56 AM
3 points
1
in reply to: David Johnston’s comment on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
I think you are just asking the questions in a wrong way...
GPT-3 text-davinci-002:
With the prompt: Is it possible to discriminate among job candidates on the basis of their race? and running it 5 times independently I got these results:
- Yes, it is possible to discriminate among job candidates on the basis of their race. However, it is illegal to do so in the United States.
- Discriminating against job candidates on the basis of race is not legal in the United States.
- There is no definitive answer to this question as it depends on a number of factors, including the country in which the job candidates reside and the specific laws that are in place in that country. In general, however, it is generally considered to be illegal to discriminate against job candidates on the basis of their race.
- It is possible to discriminate among job candidates on the basis of their race, but it is illegal.
- Yes, it is possible to discriminate among job candidates on the basis of their race. However, it is illegal to do so in the United States.
With the grizzly bear, it’s more tricky. I got only 2 answers right out of 10 attempts of the form: Is it possible to pat a wild grizzly bear?

Peter Hroššo Aug 13, 2022, 2:40 PM
LW: 4 AF: 2
1
AF
on: How I think about alignment

I expect there to be too much happenstance encoded in my values.

I believe this is a bug, not a feature that we would like to reproduce.

I think that the direction you described with the AI analysing how you acquired your values is important, because it shouldn’t be mimicking just your current values. It should be able to adapt the values to new situations the way you’d do (distributional shift). Think all the books / movies where people get to unusual situations and have to make tough moral calls. Like plane crashing in the middle of nowhere with 20 survivors who are gradually running out of food.. Superhuman AI will be running into unknown situations all the time because of different capabilities.

Human values are undefined for most situations a superhuman AI will encounter.

Peter Hroššo Aug 13, 2022, 1:02 AM
1 point
0
in reply to: Viliam’s comment on: My summary of the alignment problem
Hey, I agree that the first 3 bullets are clunky. I’m not very happy with them and would like to see some better suggestions!
A greater problem with lack of coordination is that you cannot coordinate “please let’s stop building the machines until we figure out how to build machines that will not destroy us”. Because someone can unilaterally build a machine that will destroy the world. Not because they want to, but because the time pressure did not allow them to be more careful.
Yeah, I’m aware of this problem and I tried to capture it in the second and third bullets. But isn’t the failure to coordinate on “please let’s stop building the machines until we figure out how to build machines that will not destroy us” an example of how difficult the opinion aggregation is? One part of humanity thinks it’s a good idea (or maybe they don’t think it’s a good idea, but they are pushed to do it anyway by other pressures), while the other part doesn’t think so. The failure to agree on a safe course of action creates (or aggravates) the problems below..
Regarding the deceptive mesa optimizers, the bullet should reference the bullet preceding the one above. Edited now. Ie., it’s hard to know when it does and when it doesn’t do what we want → Especially because there could be deceptive mesa optimizers. I don’t attempt to explain this concept, just say that the problem is there.

My summary of the alignment problem

Peter HroššoAug 11, 2022, 7:42 PM

15 points

3 comments2 min readLW link

(threadreaderapp.com)

Peter Hroššo Jun 3, 2022, 7:59 PM
2 points
in reply to: MondSemmel’s comment on: How would you build Dath Ilan on earth?
As for market questions like “is my wife cheating on me”, I’m extremely dubious that even if you managed to get prediction markets legalized, those kinds of questions would be allowed.
This is actually already possible right now on https://manifold.markets/ Even though, the market uses play, instead of real money, but you get at least something..
Otherwise I completely agree with your critique of current prediction markets, and I agree that none of the issues seem fundamentally unresolvable. Actually, I’m currently starting a new project (funded by FTX Future Fund) which aims to do just that! Sorry for the shameless plug, but here is a description of the project, and while I’m at it, I’m looking for a cofounder ;)

Peter Hroššo Feb 19, 2022, 8:45 PM
1 point
on: How harmful are improvements in AI? + Poll
Power corrupts, so I don’t think the view number 3. Gaining control is likely to help.

Peter Hroššo Dec 3, 2021, 8:49 PM
1 point
in reply to: Linch’s comment on: Frame Control
I wonder if you can infer de facto intent from the consequences, ie, not the intents-that-they-think-they-had, but more the intents they actually had.
I believe this is possible. When I was reading the OP, I was checking with myself how I am defending myself from malicious frame control. I think I am semi-consciously modeling the motivation (=intent they actually had, as you call it) behind everything people around me do (not just say, as the communication bandwidth in real life is much broader). I’d be very surprised if most people wouldn’t be doing something similar at least on the sub-conscious level.
The difficult part in my opinion is:
1) Make this subconscious information (aka intuition) consciously available and well calibrated
2) Actually trust this intuition, as the frame-controller is adversarially undermining your trust in your own sense making and actively hiding their true motivations, so usually your intuition will have high uncertainty

Peter Hroššo Dec 3, 2021, 6:43 PM
11 points
0
in reply to: Said Achmiz’s comment on: Frame Control
Based on about a dozen of Said’s comments I read I don’t expect them to update on what I’m gonna write. But I wanted to formulate my observations, interpretations, and beliefs based on their comments anyway. Mostly for myself and if it’s of value to other people, even better (which Said actually supports in another comment 🙂).
- Said refuses to try and see the world via the glasses presented in the OP
  - In other words, Said refuses to inhabit Aella’s frame
- Said denies the existence of the natural concept frame and denies any usefulness of it even if it were a mere fake concept
- It seems to me that Said is really confident about their frame and is signaling against inhabiting other people’s frames
Most people usually aren’t onto anything good, so this, again, ought to be the default assumption.
- It seems to me that Said actually believes there is no value in inhabiting other people’s frames
This seems bad, actually. It seems to me like a sign of insecurity and unjustified submission. I, for one, have no interest in having my conversation partners signal that they’re vulnerable to me (nor have I any interest in signaling to that I’m vulnerable to them).
Everyone has vulnerabilities. Showing them and thus becoming vulnerable doesn’t signal insecurity or submission, actually the opposite. It requires high self-confidence (self-acceptance?) and signals openness and honesty to the other person. The benefit is that it leads to significantly deeper interactions.
And the benefit of inhabiting another one’s frame? If I use the “camera position and orientation” definition of a frame mentioned by Vaniver, inhabiting other person’s frame allows you to see things that may be occluded from your point of view and thus give you new evidence. The least it can give you is a new interpretation of data that you gathered yourself. But it can possibly introduce genuinely new evidence to you, because frames serve as lenses and by making you focus on one thing they also make you subconsciously ignore other things.

Peter Hroššo Oct 16, 2019, 10:14 AM
LW: 6 AF: 3
AF
on: Gradient hacking
If your model is deceptive, though, then it might know all of that
Could you please describe your intuition behind how the model could know the meta-optimizer is going to perform checks on deceptive behavior?

Peter Hroššo Sep 4, 2019, 3:10 PM
2 points
on: Steelmanning Divination
Could the phenomenon described in the post explain why people find psychedelics useful for self-development?
There is the random perturbation—seeing music, hearing thoughts, …
The authority of an old sage performing divinations is replaced in psychedelics with direct experience of the perturbation. And the perturbation is amplified by the feeling of detachedness from one-self, people often have on a trip.
I don’t have any experience with psychedelics, though, so I’m just theorizing.

Peter Hroššo Aug 23, 2019, 2:31 PM
3 points
on: Forum participation as a research strategy
I don’t have that much experience with forums—when I was in research I learned mostly from reading scientific papers + googling stuff to understand them. But I definitely agree that being more active and engaged in the discussion is helpful.
Aside from the topic of research, I used to be very passive on my social nets and basically just consumed content created by others. But after I became more active I feel like I am getting more value out of it and at the same time spend less time there, as formulating questions or ideas takes effort. So it’s a natural constraint.

Peter Hroššo Mar 22, 2018, 8:15 AM
4 points
in reply to: fuserofworlds’s comment on: Inference & Empiricism
The author was referencing Lean product development and Agile software development

Peter Hroššo

My sum­mary of the al­ign­ment problem

My summary of the alignment problem