pengvado

Karma: 1,187

pengvado Aug 19, 2013, 4:38 AM
1 point
in reply to: Viliam_Bur’s comment on: Does Checkers have simpler rules than Go?
I interpret Daniel_Burfoot’s idea as: “import java.util.*” makes subsequent mentions of List longer, since there are more symbols in scope that it has to be distinguished from.

But I don’t think that idea actually works. You can decompose the probability of a conjunction into a product of conditional probabilities, and you get the same number regardless of the order of said decomposition. Whatever probability (and corresponding total compressed size) you assign to a certain sequence of imports and symbols, you could just as well record the symbols first and then the imports. In which case by the time you get to the imports you already know that the program didn’t invoke any utils other than List and Map, so even being able to represent the distinction between the two forms of import is counterproductive for compression.

The intuition of “there are more symbols in scope that it has to be distinguished from” goes wrong because it fails to account for updating a probability distribution over what symbol comes next based on what symbols you’ve seen. Such an update can include knowledge of which symbols come in the same header, if that’s correlated with which symbols are likely to appear in the same program.

pengvado Jul 30, 2013, 8:44 PM
2 points
in reply to: Transfuturist’s comment on: Open thread, July 29-August 4, 2013
Eliezer’s proposal was a different notation, not an actual change in the strength of Solomonoff Induction. The usual form of SI with deterministic hypotheses is already equivalent to one with probabilistic hypotheses. Because a single hypothesis with prior probability P that assigns uniform probability to each of 2^N different bitstrings, makes the same predictions as an ensemble of 2^N deterministic hypotheses each of which has prior probability P*2^-N and predicts one of the bitstrings with certainty; and a Bayesian update in the former case is equivalent to just discarding falsified hypotheses in the latter. Given any computable probability distribution, you can with O(1) bits of overhead convert it into a program that samples from that distribution when given a uniform random string as input, and then convert that into an ensemble of deterministic programs with different hardcoded values of the random string. (The other direction of the equivalence is obvious: a computable deterministic hypothesis is just a special case of a computable probability distribution.)

Yes, if you put a Solomonoff Inductor in an environment that contains a fair coin, it would come up with increasingly convoluted Turing machines. This is a problem only if you care about the value of an intermediate variable (posterior probability assigned to individual programs), rather than the variable that SI was actually designed to optimize, namely accurate predictions of sensory inputs. This manifests in AIXI’s limitation to using a sense-determined utility function. (Granted, a sense-determined utility function really isn’t a good formalization of my preferences, so you couldn’t build an FAI that way.)

pengvado Jul 25, 2013, 1:16 AM
1 point
in reply to: [deleted]’s comment on: Open thread, July 23-29, 2013
Is there a benefit from doing that server-side rather than client-side? I’ve long since configured my web browser to always use my favorite font rather than whatever is suggested by any website.

pengvado Jul 19, 2013, 8:17 AM
1 point
in reply to: ChrisHallquist’s comment on: Harry Potter and the Methods of Rationality discussion thread, part 24, chapter 95

“Oh,” said Professor Quirrell, “don’t worry about a little rough handling. You could toss this diary in a fireplace and it would emerge unscathed.

That isn’t necessarily the same level of indestructibility as a horcrux. It could just be a standard charm placed on rare books.

pengvado Jul 19, 2013, 3:14 AM
1 point
in reply to: Vaniver’s comment on: Evidential Decision Theory, Selection Bias, and Reference Classes
If I already know “I am EDT”, then “I saw myself doing X” does imply “EDT outputs X as the optimal action”. Logical omniscience doesn’t preclude imagining counterfactual worlds, but imagining counterfactual worlds is a different operation than performing Bayesian updates. CDT constructs counterfactuals by severing some of the edges in its causal graph and then assuming certain values for the nodes that no longer have any causes. TDT does too, except with a different graph and a different choice of edges to sever.

pengvado Jul 18, 2013, 10:03 AM
1 point
in reply to: Vaniver’s comment on: Evidential Decision Theory, Selection Bias, and Reference Classes
The way EDT operates is to perform the following three steps for each possible action in turn:
1. Assume that I saw myself doing X.
2. Perform a Bayesian update on this new evidence.
3. Calculate and record my utility.
Ideal Bayesian updates assume logical omniscience, right? Including knowledge about logical fact of what EDT would do for any given input. If you know that you are an EDT agent, and condition on all of your past observations and also on the fact that you do X, but X is not in fact what EDT does given those inputs, then as an ideal Bayesian you will know that you’re conditioning on something impossible. More generally, what update you perform in step 2 depends on EDT’s input-output map, thus making the definition circular.

So, is EDT really underspecified? Or are you supposed to search for a fixed point of the circular definition, if there is one? Or does it use some method other than Bayes for the hypothetical update? Or does an EDT agent really break if it ever finds out its own decision algorithm? Or did I totally misunderstand?

pengvado Jul 17, 2013, 4:01 AM
0 points
in reply to: CronoDAS’s comment on: New report: Intelligence Explosion Microeconomics
Yes (At least that’s the general consensus among complexity theorists, though it hasn’t been proved.) This doesn’t contradict anything Eliezer said in the grandparent. The following are all consensus-but-not-proved:

P⊂BQP⊂EXP
P⊂NP⊂EXP
BQP≠NP (Neither is confidently predicted to be a subset of the other, though BQP⊂NP is at least plausible, while NP⊆BQP is not.)
If you don’t measure any distinctions finer than P vs EXP, then you’re using a ridiculously coarse scale. There are lots of complexity classes strictly between P and EXP, defined by limiting resources other than time-on-a-classical-computer. Some of them are tractable under our physics and some aren’t.

pengvado Jul 14, 2013, 12:32 AM
0 points
in reply to: Paul Crowley’s comment on: Open Thread, July 1-15, 2013
If instead the simulator can read the real probability on an infinite tape… obviously it can’t read the whole tape before producing an output. So it has to read, then output, then read, then output. It seems intuitive that with this strategy, it can place an absolute limit on the advantage that any attacker can achieve, but I don’t have a proof of that yet.

In this model, a simulator can exactly match the desired probability in O(1) expected time per sample. (The distribution of possible running times extends to arbitrarily large values, but the L1-norm of the distribution is finite. If this were a decision problem rather than a sampling problem, I’d call it ZPP.)

Algorithm:
1. Start with an empty string S.
2. Flip a coin and append it to S.
3. If S is exactly equal to the corresponding-length prefix of your probability tape P, then goto 2.
4. Compare (S < P)

pengvado Jun 24, 2013, 10:45 AM
4 points
in reply to: JoshuaZ’s comment on: For FAI: Is “Molecular Nanotechnology” putting our best foot forward?

Answering “how will this protein most likely fold?” is computationally much easier (as far as we can tell) than answering “what protein will fold like this?”

Got a reference for that? It’s not obvious to me (CS background, not bio).

What if you have an algorithm that attempts to solve the “how will this protein most likely fold?” problem, but is only tractable on 1% of possible inputs, and just gives up on the other 99%? As long as the 1% contains enough interesting structures, it’ll still work as a subroutine for the “what protein will fold like this?” problem. The search algorithm just has to avoid the proteins that it doesn’t know how to evaluate. That’s how human engineers work, anyway: “what does this pile of spaghetti code do?” is uncomputable in the worst case, but that doesn’t stop programmers from solving “write a program that does X”.

pengvado Jun 23, 2013, 11:28 AM
3 points
in reply to: cousin_it’s comment on: Open Thread, June 16-30, 2013
The selection effect you mention only applies to offering bets, not accepting them. If Alice announces her betting odds and then Bob decides which side of the bet to take, Alice might be doing something irrational there (if she didn’t have a bid-ask spread), but we can still talk about dutch books from Bob’s perspective. If you want to eliminate the effect whereby Bob updates on the existence of Alice’s offer before making his decision, then replace Alice with an automated market maker (setup by someone who expects to lose money in exchange for outsourcing the probability estimate). Or assume some natural process with a naturally occurring payoff ratio that isn’t determined by the payoff frequencies nor by anyone’s state of knowledge.

pengvado Jun 11, 2013, 7:03 AM
2 points
in reply to: solipsist’s comment on: Prisoner’s Dilemma (with visible source code) Tournament
I had in mind an automated wrapper generator for the “passed own sourcecode” version of the contest:
```
(define CliqueBot
 (lambda (self opponent)
  (if (eq? self opponent) 'C 'D)))
(define Wrapper
 (lambda (agent)
  (lambda (self opponent)
   (agent agent opponent))))
(define WrappedCliqueBot
 (Wrapper CliqueBot))
```
Note that for all values of X and Y, (WrappedCliqueBot X Y) == (CliqueBot CliqueBot Y), and there’s no possible code you could add to CliqueBot that would break this identity. Now I just realized that the very fact that WrappedCliqueBot doesn’t depend on its “self” argument, provides a way to distinguish it from the unmodified CliqueBot using only blackbox queries, so in that sense it’s not quite functionally identical. Otoh, if you consider it unfair to discriminate against agents just because they use old-fashioned quine-type self-reference rather than exploiting the convenience of a “self” argument, then this transformation is fair.

pengvado Jun 10, 2013, 1:23 PM
3 points
in reply to: solipsist’s comment on: Prisoner’s Dilemma (with visible source code) Tournament
How does that help? A quine-like program could just as well put its real payload in a string with a cryptographic signature, verify the signature, and then eval the string with the string as input; thus emulating the “passed its own sourcecode” format. You could mess with that if you’re smart enough to locate and delete the “verify the signature” step, but then you could do that in the real “passed its own sourcecode” format too.

Conversely, even if the tournament program itself is honest, contestants can lie to their simulations of their opponents about what sourcecode the simulation is of.

pengvado May 9, 2013, 3:11 AM
2 points
in reply to: jsteinhardt’s comment on: Estimates vs. head-to-head comparisons
Assume you have noisy measurements X1, X2, X3 of physical quantities Y1, Y2, Y3 respectively; variables 1, 2, and 3 are independent; X2 is much noisier than the others; and you want a point-estimate of Y = Y1+Y2+Y3. Then you shouldn’t use either X1+X2+X3 or X1+X3. You should use E[Y1|X1] + E[Y2|X2] + E[Y3|X3]. Regression to the mean is involved in computing each of the conditional expectations. Lots of noise (relative to the width of your prior) in X2 means that E[Y2|X2] will tend to be close to the prior E[Y2] even for extreme values of X2, but E[Y2|X2] is still a better estimate of that portion of the sum than E[Y2] is.

pengvado Apr 22, 2013, 2:07 PM
4 points
in reply to: Qiaochu_Yuan’s comment on: Can somebody explain this to me?: The computability of the laws of physics and hypercomputation

unless you are allowed to pose infinitely many problems

Or one selected at random from an infinite class of problems.

Also, if the universe is spatially infinite, it can solve the halting problem in a deeply silly way, namely there could be an infinite string of bits somewhere, each a fixed distance from the next, that just hardcodes the solution to the halting problem.

That’s why both computability theory and complexity theory require algorithms to have finite sized sourcecode.

pengvado Apr 22, 2013, 12:05 AM
3 points
in reply to: MugaSofer’s comment on: Time turners, Energy Conservation and General Relativity
Novikov consistency is synonymous with Stable Time Loop, where all time travelers observe the same events as they remember from their subjectively-previous iteration. This is as opposed to MWI-based time travel, where the no paradox rule merely requires that the overall distribution of time travelers arriving at t0 is equal to the overall distribution of people departing in time machines at t1.

Yes, Novikov talked about QM. He used the sum-over-histories formulation, restricted to the subset of histories that each singlehandedly form a classical stable time loop. This allows some form of multiple worlds, but not standard MWI: This forbids any Everett branching from happening during the time loop (if any event that affects the time traveler’s state branched two ways, one of them would be inconsistent with your memory), and instead branches only on the question of what comes out of the time machine.

pengvado Apr 21, 2013, 11:45 AM
9 points
in reply to: wedrifid’s comment on: Bitcoins are not digital greenbacks

The US government made Tor? Awesome. I wonder which part of the government did it.

U.S. Naval Research Laboratory.

pengvado Apr 18, 2013, 1:50 PM
5 points
in reply to: Shmi’s comment on: Time turners, Energy Conservation and General Relativity
You can certainly postulate a physics that’s both MWI and contains something sorta like Time-Turners except without the Novikov property. The problem with that isn’t paradox, it just doesn’t reproduce the fictional experimental evidence we’re trying to explain. What’s impossible is MWI with something exactly like Time-Turners including Novikov.

pengvado Apr 7, 2013, 5:07 AM
7 points
in reply to: Qiaochu_Yuan’s comment on: Fermi Estimates

More precisely, you can compute the variance of the logarithm of the final estimate and, as the number of pieces gets large, it will shrink compared to the expected value of the logarithm (and even more precisely, you can use something like Hoeffding’s inequality).

If success of a fermi estimate is defined to be “within a factor of 10 of the correct answer”, then that’s a constant bound on the allowed error of the logarithm. No “compared to the expected value of the logarithm” involved. Besides, I wouldn’t expect the value of the logarithm to grow with number of pieces either: the log of an individual piece can be negative, and the true answer doesn’t get bigger just because you split the problem into more pieces.

So, assuming independent errors and using either Hoeffding’s inequality or the central limit theorem to estimate the error of the result, says that you’re better off using as few inputs as possible. The reason fermi estimates even involve more than 1 step, is that you can make the per-step error smaller by choosing pieces that you’re somewhat confident of.

pengvado Mar 24, 2013, 2:52 AM
0 points
in reply to: Kawoomba’s comment on: Reflection in Probabilistic Logic
What if, in building a non-Löb-compliant AI, you’ve already failed to give it part of your inference ability / trust-in-math / whatever-you-call-it? Even if the AI figures out how to not lose any more, that doesn’t mean it’s going to get back the part you missed.

Possibly related question: Why try to solve decision theory, rather than just using CDT and let it figure out what the right decision theory is? Because CDT uses its own impoverished notion of “consequences” when deriving what the consequence of switching decision theories is.

pengvado Mar 19, 2013, 1:20 AM
0 points
in reply to: pinyaka’s comment on: Bayesian Adjustment Does Not Defeat Existential Risk Charity
“50:4” in the post refers to “P(V=1|A=100)*1 : P(V=100|A=100)*100″, not “EV(A=1) : EV(A=100)”. EV(A=1) is irrelevant, since we know that A is in fact 100.