danieldewey

Karma: 489

danieldewey Mar 27, 2015, 11:24 PM
0 points
AF
in reply to: jessicata’s comment on: Reflective oracles and the procrastination paradox
Cool, thanks; sounds like I have about the same picture. One missing ingredient for me that was resolved by your answer, and by going back and looking at the papers again, was the distinction between consistency and soundness (on the natural numbers), which is not a distinction I think about often.

In case it’s useful, I’ll note that the procrastination paradox is hard for me to take seriously on an intuitive level, because some part of me thinks that requiring correct answers in infinite decision problems is unreasonable; so many reasoning systems fail on these problems, and infinite situations seem so unlikely, that they are hard for me to get worked up about. This isn’t so much a comment on how important the problem actually is, but more about how much argumentation may be required to convince people like me that they’re actually worth working on.

danieldewey Mar 26, 2015, 11:12 PM
0 points
AF
on: Reflective oracles and the procrastination paradox
I don’t (confidently) understand why the procrastination paradox indicates a problem to be solved. Could you clarify that for me, or point me to a clarification?

First off, it doesn’t seem like this kind of infinite buck-passing could happen in real life; is there a real-life (finite?) setting where this type of procrastination leads to bad actions? Second, it seems to me that similar paradoxes often come up in other situations where agents have infinite time horizons and can wait as long as they want—does the problem come from the infinity, or from something else?

The best explanation that I can give is “It’s immediately obvious to a human, even in an infinite situation, that the only way to get the button pressed is to press it immediately. Therefore, we haven’t captured human reasoning (about infinite situations), and we should capture that human reasoning in order to be confident about AI reasoning.” This is AFAICT the explanation Nate gives in the Vingean Reflection paper. Is that how you would express the problem?

danieldewey Mar 13, 2015, 12:33 AM
4 points
on: Acausal trade barriers
Is this sort of a way to get an agent with a DT that admits acausal trade (as we think the correct decision theory would) to act more like a CDT agent? I wonder how different the behaviors of the agent you specify are from those of a CDT agent—in what kinds of situations would they come apart? When does “I only value what happens given that I exist” (roughly) differ from “I only value what I directly cause” (roughly)?

danieldewey Feb 20, 2015, 9:22 AM
2 points
in reply to: spxtr’s comment on: [LINK] The Wrong Objections to the Many-Worlds Interpretation of Quantum Mechanics
I don’t have the expertise to evaluate it, but Brian Greene suggests this experiment.

danieldewey Feb 9, 2015, 6:57 PM
5 points
in reply to: Kaj_Sotala’s comment on: Request for proposals for Musk/FLI grants
I would encourage you to apply, these ideas seem reasonable!

As far as choosing, I would advise you to choose the idea for which you can make the case most strongly that it is Topical and Impactful, as defined here.

Request for proposals for Musk/FLI grants

danieldeweyFeb 5, 2015, 5:04 PM

35 points

11 comments2 min readLW link

danieldewey Dec 13, 2014, 12:25 PM
0 points
AF
on: Model-free decisions
It seems that if it is desired, the overseer could also set their behaviour and intentions so that the approval-directed agent acts as we would want an oracle or tool to act. This is a nice feature.

danieldewey Dec 3, 2014, 2:15 PM
0 points
AF
on: Model-free decisions
I think Nick Bostrom and Stuart Armstrong would also be interested in this, and might have good feedback for you.

danieldewey Dec 3, 2014, 2:12 PM
0 points
AF
on: Model-free decisions
High-level feedback: this is a really interesting proposal, and looks like a promising direction to me! Most of my inline comments on Medium are more critical, but that doesn’t reflect my overall assessment.

danieldewey Nov 29, 2014, 12:36 PM
7 points
in reply to: Paul Crowley’s comment on: [Link] Will Superintelligent Machines Destroy Humanity?
That’s what I thought at first, too, but then I looked at the paper, and their figure looks right to me. Could you check my reasoning here?

On p.11 of Vincent’s and Nick’s survey, there’s a graph “Proportion of experts with 10%/50%/90% confidence of HLMI by that date”. At around the the 1 in 10 mark of proportion of experts—the horizontal line from 0.1 -- the graph shows that 1 in 10 experts thought there was a 50% chance of HLAI by 2020 or so (the square-boxes-line), and 1 in 10 thought there was a 90% chance of HLAI by 2030 or so (the triangles-line). So, maybe 1 in 10 researchers think there’s a 70% chance of HLAI by 2025 or so, which is roughly in line with the journalist’s remark.

Did I do that right? Do you think the graph is maybe incorrect? I haven’t checked the number against other parts of the paper.

There’s a good chance that the reviewer got the right number by accident, I think, but it doesn’t seem far enough away to call out.

danieldewey Nov 18, 2014, 2:30 PM
LW: 1 AF: 1
AF
on: Stable self-improvement as a research problem
I would be curious to see more thoughts on this from people who have thought more than I have about stable/reliable self-improvement/tiling. Broadly speaking, I am also somewhat skeptical that it’s the best problem to be working on now. However, here are some considerations in favor:

It seems plausible to me that an AI will be doing most of the design work before it is a “human-level reasoner” in your sense. The scenario I have in mind is a self-improvement cycle by a machine specialized in CS and math, which is either better than humans at these things, or is changing too rapidly for humans to effectively help it. This would create what Bostrom has called (in private correspondence) a “competence gap”, where the AI can and does self-improve, but may not solve the tiling problem or balance risk the way we would have liked it to. In this case, being able to solve this problem for it directly is helpful.

30% efficiency improvement seems quite large, even for major software changes, in machine learning. I’m not sure how much this affects your overall point.

On the value of work now vs. later, I would probably try to determine this mostly by thinking about how much this work will help us grow interest in the area among people who will wield useful skills an influence later. So far, work on the Löbian obstacle has been pretty good on this metric (if you count it as partially responsible for attracting Benja and Nate, attention from mathematicians, its importance to past workshops, Nik Weaver, etc.).
What links here?
- What are some exercises for building/generating intuitions about key disagreements in AI alignment? by riceissa (Mar 16, 2020, 7:41 AM; 18 points)

danieldewey Nov 10, 2014, 10:06 PM
0 points
AF
on: Exploiting EDT
I wonder if this example can be used to help pin down desiderata for decisions or decision counterfactuals. What axiom(s) for decisions would avoid this general class of exploits?

danieldewey Nov 10, 2014, 10:00 PM
0 points
AF
in reply to: Benya_Fallenstein’s comment on: Exploiting EDT
Hm, I don’t know what the definition is either. In my head, it means “can get an arbitrary amount of money from”, e.g. by taking it around a preference loop as many times as you like. In any case, glad the feedback was helpful.

danieldewey Nov 10, 2014, 4:52 PM
LW: 3 AF: 2
AF
on: Exploiting EDT
Nice example! I think I understood better why this picks out the particular weakness of EDT (and why it’s not a general exploit that can be used against any DT) when I thought of it less as a money-pump and more as “Not only does EDT want to manage the news, you can get it to pay you a lot for the privilege”.

danieldewey Nov 5, 2014, 5:34 PM
0 points
AF
in reply to: paulfchristiano’s comment on: Topological truth predicates: Towards a model of perfect Bayesian agents
I’d love to see what this approach looked like worked out in more detail. I think there would be a lot of benefit to making this kind of result accessible to computer scientists without strong math backgrounds.

danieldewey Nov 5, 2014, 1:09 PM
LW: 2 AF: 1
AF
on: Welcome!
Thanks Benja, Elliott, and Vladimir for creating the site—it looks great.

danieldewey Oct 15, 2014, 8:52 AM
4 points
in reply to: Vulture’s comment on: Open thread, Oct. 13 - Oct. 19, 2014
Lots of good stuff happened there, but it looks like it’ll have to be curated fairly actively to continue to make progress, and unfortunately that doesn’t fit with my current duties.

If someone else would like to act as a leader for it, I’d be happy for that! In any case, I’m glad we tried it, and thankful that so many people jumped in.

danieldewey Oct 1, 2014, 1:31 PM
1 point
in reply to: Illano’s comment on: Polymath-style attack on the Parliamentary Model for moral uncertainty
If what you say is true about all trades being 1-for-1, that seems more like a bug than a feature; if an agent doesn’t have any votes valuable enough to sway others, it seems like I’d want them to be able (i.e. properly incentivized) to offer more votes, so that the system overall can reflect the aggregate’s values more sensitively. I don’t have a formal criterion that says why this would be better, but maybe that points towards one.

danieldewey Sep 28, 2014, 3:39 PM
1 point
in reply to: Manfred’s comment on: Polymath-style attack on the Parliamentary Model for moral uncertainty
This actually sounds plausible to me, but I’m not sure how to work it out formally. It might make for a suprising and interesting result.

danieldewey Sep 28, 2014, 1:44 PM
4 points
in reply to: gjm’s comment on: Polymath-style attack on the Parliamentary Model for moral uncertainty
Great example. As an alternative to your three options (or maybe this falls under your first bullet), maybe negotiation should happen behind a veil of ignorance about what decisions will actually need to be made; the delegates would arrive at a decision function for all possible decisions.

Your example does make me nervous, though, on the behalf of delegates who don’t have much to negotiate with. Maybe (as badger says) cardinal information does need to come into it.

danieldewey

Re­quest for pro­pos­als for Musk/​FLI grants

Request for proposals for Musk/FLI grants